Artificial intelligence to create equitable multi-ethnic polygenic risk scores that improve clinical care

Lead Research Organisation: University of Cambridge
Department Name: Public Health and Primary Care

Abstract

Overall Objective: To develop state-of-the-art artificial intelligence (AI) methods which address the ethnic inequities inherent in genomic medicine and rapidly emerging polygenic risk scores (PRSs). We will subsequently apply these approaches to demonstrate improved screening, diagnosis and treatment of common disease.

Background: Recent breakthroughs in genomics and machine learning have generated personalized medicine tools that have the potential to improve patient care and health maintenance through PRSs. PRSs are risk predictors, which combine information across the entire genome to predict a person's risk of disease, or whether they have unfavorable disease risk factors, such as high cholesterol levels. While still evolving rapidly, these PRSs are more predictive of susceptibility to disease than many traditional disease risk factors. Since PRSs can predict risk of disease they could be helpful in screening programs, by identifying a group of individuals at such low risk of disease, that screening would be unlikely to be helpful-thereby allowing screening programs to focus on individuals at highest risk. They could also refine diagnosis, by identifying people in the population most likely to have a disease and then targeting required diagnostic testing in those at highest risk. Last, they can also improve disease treatment by identifying individuals most likely to benefit from treatment.

While PRSs could help to improve clinical care, they have been largely developed in individuals of European-only ancestry. This is because the large cohorts from which PRSs have been developed consist predominantly of European-ancestry participants. Since genetic make-up varies by ancestry, the performance of these PRSs in non-European ancestries is considerably worse; it is known that not only differences in in genetic risk factors lead to attenuation of performance of PRSs, but also that changes in correlation patterns in our genomes due to distinct population histories decrease predictive accuracy even if the actual genetic risk factors remain the same.

This creates several important problems for the roll-out of such tests in the Canadian and UK healthcare environments. This is because approximately 22% of Canada's population and 12% of the UK are visible minorities. Further, these populations can have increased rates of health care utilization. Therefore, use of European-only PRSs could serve to worsen existing health disparities.

Canada's Chief Information Officer stated that, "Using Artificial Intelligence in government means balancing innovation with the ethical and responsible use of emerging technologies." Using PRSs developed to benefit only the majority group in our societies would not respect these directives and therefore our program will serve to ensure the responsible use of AI.

What are our Specific Goals?
1) To develop AI methods and open source software packages to improve the accuracy of PRSs for CHD and hyperlipidemia for individuals of non-European ancestry.
2) To compare the performance of these ancestry-adapted PRSs in individuals in a diverse set of cohorts, representing multiple ancestries in the UK and Canada.

The Team: Recruiting leaders across the UK and Canada, we have a depth of expertise in AI methods development, human genetics, clinical medicine, cohort development for minority groups, and statistical genetics. This gender-balanced team involves an appropriate mix of senior and junior investigators.

Relevance: Success in this AI-enabled program will allow for the transfer of recent advances in genomic medicine to the citizens of Canada and the UK, regardless of ancestry. The AI methods developed will be widely applicable to other PRSs in development by other groups. As genomics takes root in regular clinician-patient interactions, our set of AI-based tools will ensure that the benefit derived from these advances will be shared by all citizens of our countries.

Planned Impact

In this proposal, we aim to address a real-world clinical problem that if not resolved will impair the equitable uptake of AI-based genomics into clinical care. Given the urgency of this problem, we have incorporated into our program from its inception end-users who will directly benefit from this research.

Who will benefit from this research? If successful, our research will predominantly benefit people who are of non-European descent. To realize this benefit, we have engaged four end-users (see Letters of Support). Here we describe each end-user and their role in the program.

The Integrated Health & Social Services University Network for West-Central Montreal (known by its French acronym CIUSSS) provides clinical care to approximately 362,000 people who are served by a partnership of more than 30 healthcare facilities. Included is one of Montreal's leading hospitals and three specialized hospitals, five community care clinics, two rehabilitation centres, four residential centres, two long-term geriatric residences, and two day centres. Dr. Lawrence Rosenberg is the CEO of this CIUSSS and directly supports our program of research since the largest multicultural community in Québec is served by this CIUSSS. Dr. Rosenberg is currently leading a team of researchers and clinicians to implement genomics-based medicine within the CIUSSS.

Health Data Research UK is the UK's national Institute for health data science, directed by Professor Andrew Morris. It is an independent, non profit organisation bringing together 22 of the UK's leading universities and research institutions to address a common mission of uniting the UK's health data to make discoveries that improve people's lives. HDR UK's vision is that every healthcare interaction and research endeavour will be enhanced by access to large scale data and advanced analytics. The ambition to generate advanced AI tools to address health inequalities and collaborative research presented in this proposal fully aligns with HDR UK's mission.

Genomics England is owned and funded by the Department of Health & Social Care, England, set up to deliver the 100,000 Genomes Project. Its four main aims are to create an ethical and transparent genomics medicine programme based on patient consent; to bring benefit to patients and set up a genomic medicine service for the NHS; to enable new scientific discovery and medical insights; and to kick-start the development of a UK genomics industry. The 100,000 Genomes Project aims to bring the benefits of personalised medicine to the NHS. To make sure patients benefit from innovations in genomics and contribute towards delivering high quality care for all, now and for future generations. Involvement of Genomics England will provide an important route to further demonstrate the clinical utility of the equitable PRSs developed within this project.

Think Research represents our knowledge translation partner. The clinical impact of our research can only be realized if physicians understand when to order PRSs, how to interpret their results, document their findings and alter patient care. We are not naive to the tremendous difficulties encountered when attempting to change the behaviour of physicians. Therefore we have engaged with Think Research, a Canadian company that has specialized in electronic health record clinical support tools. Think Research's solutions are used in over 2,000 health care facilities world-wide. After completion of our program, our knowledge translation partner will maximize the clinical utility of our research by enabling uptake in health care systems in both Canada and the UK. Despite our engagement with Think Research, they will have no access to data, or the algorithms that were derived from data.

Summary: Through careful engagement with four essential end-users we will maximize the clinical impact of our research program.
 
Description Polygenic risk scores (PRSs) aggregate the many small effects of alleles across the human genome to estimate the risk of a disease or disease-related trait for an individual. The potential benefits of PRSs include cost-effective enhancement of primary disease prevention, more refined diagnoses and improved precision when prescribing medicines. However, these must be weighed against the potential risks, such as uncertainties and biases in PRS performance, as well as potential misunderstanding and misuse of these within medical practice and in wider society. By addressing key issues including gaps in best practices, risk communication and regulatory frameworks, PRSs can be used responsibly to improve human health. Here, the International Common Disease Alliance's PRS Task Force, a multidisciplinary group comprising expertise in genetics, law, ethics, behavioral science and more, highlights recent research to provide a comprehensive summary of the state of polygenic score research, as well as the needs and challenges as PRSs move closer to widespread use in the clinic.

Genetic association studies for blood cell traits, which are key indicators of health and immune function, have identified several hundred associations and defined a complex polygenic architecture. Polygenic scores (PGSs) for blood cell traits have potential clinical utility in disease risk prediction and prevention, but designing PGS remains challenging and the optimal methods are unclear. To address this, we evaluated the relative performance of 6 methods to develop PGS for 26 blood cell traits, including a standard method of pruning and thresholding (P + T) and 5 learning methods: LDpred2, elastic net (EN), Bayesian ridge (BR), multilayer perceptron (MLP) and convolutional neural network (CNN). We evaluated these optimized PGSs on blood cell trait data from UK Biobank and INTERVAL. We find that PGSs designed using common machine learning methods EN and BR show improved prediction of blood cell traits and consistently outperform other methods. Our analyses suggest EN/BR as the top choices for PGS construction, showing improved performance for 25 blood cell traits in the external validation, with correlations with the directly measured traits increasing by 10%-23%. Ten PGSs showed significant statistical interaction with sex, and sex-specific PGS stratification showed that all of them had substantial variation in the trajectories of blood cell traits with age. Genetic correlations between the PGSs for blood cell traits and common human diseases identified well-known as well as new associations. We develop machine learning-optimized PGS for blood cell traits, demonstrate their relationships with sex, age, and disease, and make these publicly available as a resource.
Exploitation Route As in the description polygenic scores are being translated into healthcare in primary disease prevention, more refined diagnoses and improved precision when prescribing medicines.
Sectors Healthcare

 
Title 26 polygenic scores 
Description Produced 26 polygenic scores for different human blood cell traits (e.g. white blood cell count etc). The models for these polygenic scores have been made freely available via the Polygenic Score Catalog 
Type Of Material Data analysis technique 
Year Produced 2021 
Provided To Others? Yes  
Impact None yet 
URL https://www.pgscatalog.org/publication/PGP000051/).
 
Title OMICSPRED 
Description OMICSPRED is a resource for predicting multi-'omics data (proteomics, metabolomics, transcriptomics, etc.) directly from genotypes. To do this, we have used a single cohort (the INTERVAL study: www.intervalstudy.org.uk) with extensive multi-'omics data to train genetic scores using machine learning. You can explore and download the genetic scores for a wide range of biomolecular traits in human blood as well as the summary statistics of their associations with key traits and diseases in the UK Biobank. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact No impact yet. 
URL https://www.omicspred.org
 
Description Sanger Open Targets 
Organisation Genome Research Ltd
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Financial support for a 0.2FTE Data Manager
Collaborator Contribution Provision of a data manager
Impact None yet
Start Year 2021
 
Description Behind the Curtains theatre production on polygenic scores 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Behind the Curtains theatre production on polygenic scores (10min), produced by Creative Encounters
Year(s) Of Engagement Activity 2021
 
Description public workshop on polygenic risk scores 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Organised a public workshop on polygenic risk scores: what they are, how are they created and how useful might they be.
Year(s) Of Engagement Activity 2021