Artificial intelligence to create equitable multi-ethnic polygenic risk scores that improve clinical care
Lead Research Organisation:
University of Cambridge
Department Name: Public Health and Primary Care
Abstract
Overall Objective: To develop state-of-the-art artificial intelligence (AI) methods which address the ethnic inequities inherent in genomic medicine and rapidly emerging polygenic risk scores (PRSs). We will subsequently apply these approaches to demonstrate improved screening, diagnosis and treatment of common disease.
Background: Recent breakthroughs in genomics and machine learning have generated personalized medicine tools that have the potential to improve patient care and health maintenance through PRSs. PRSs are risk predictors, which combine information across the entire genome to predict a person's risk of disease, or whether they have unfavorable disease risk factors, such as high cholesterol levels. While still evolving rapidly, these PRSs are more predictive of susceptibility to disease than many traditional disease risk factors. Since PRSs can predict risk of disease they could be helpful in screening programs, by identifying a group of individuals at such low risk of disease, that screening would be unlikely to be helpful-thereby allowing screening programs to focus on individuals at highest risk. They could also refine diagnosis, by identifying people in the population most likely to have a disease and then targeting required diagnostic testing in those at highest risk. Last, they can also improve disease treatment by identifying individuals most likely to benefit from treatment.
While PRSs could help to improve clinical care, they have been largely developed in individuals of European-only ancestry. This is because the large cohorts from which PRSs have been developed consist predominantly of European-ancestry participants. Since genetic make-up varies by ancestry, the performance of these PRSs in non-European ancestries is considerably worse; it is known that not only differences in in genetic risk factors lead to attenuation of performance of PRSs, but also that changes in correlation patterns in our genomes due to distinct population histories decrease predictive accuracy even if the actual genetic risk factors remain the same.
This creates several important problems for the roll-out of such tests in the Canadian and UK healthcare environments. This is because approximately 22% of Canada's population and 12% of the UK are visible minorities. Further, these populations can have increased rates of health care utilization. Therefore, use of European-only PRSs could serve to worsen existing health disparities.
Canada's Chief Information Officer stated that, "Using Artificial Intelligence in government means balancing innovation with the ethical and responsible use of emerging technologies." Using PRSs developed to benefit only the majority group in our societies would not respect these directives and therefore our program will serve to ensure the responsible use of AI.
What are our Specific Goals?
1) To develop AI methods and open source software packages to improve the accuracy of PRSs for CHD and hyperlipidemia for individuals of non-European ancestry.
2) To compare the performance of these ancestry-adapted PRSs in individuals in a diverse set of cohorts, representing multiple ancestries in the UK and Canada.
The Team: Recruiting leaders across the UK and Canada, we have a depth of expertise in AI methods development, human genetics, clinical medicine, cohort development for minority groups, and statistical genetics. This gender-balanced team involves an appropriate mix of senior and junior investigators.
Relevance: Success in this AI-enabled program will allow for the transfer of recent advances in genomic medicine to the citizens of Canada and the UK, regardless of ancestry. The AI methods developed will be widely applicable to other PRSs in development by other groups. As genomics takes root in regular clinician-patient interactions, our set of AI-based tools will ensure that the benefit derived from these advances will be shared by all citizens of our countries.
Background: Recent breakthroughs in genomics and machine learning have generated personalized medicine tools that have the potential to improve patient care and health maintenance through PRSs. PRSs are risk predictors, which combine information across the entire genome to predict a person's risk of disease, or whether they have unfavorable disease risk factors, such as high cholesterol levels. While still evolving rapidly, these PRSs are more predictive of susceptibility to disease than many traditional disease risk factors. Since PRSs can predict risk of disease they could be helpful in screening programs, by identifying a group of individuals at such low risk of disease, that screening would be unlikely to be helpful-thereby allowing screening programs to focus on individuals at highest risk. They could also refine diagnosis, by identifying people in the population most likely to have a disease and then targeting required diagnostic testing in those at highest risk. Last, they can also improve disease treatment by identifying individuals most likely to benefit from treatment.
While PRSs could help to improve clinical care, they have been largely developed in individuals of European-only ancestry. This is because the large cohorts from which PRSs have been developed consist predominantly of European-ancestry participants. Since genetic make-up varies by ancestry, the performance of these PRSs in non-European ancestries is considerably worse; it is known that not only differences in in genetic risk factors lead to attenuation of performance of PRSs, but also that changes in correlation patterns in our genomes due to distinct population histories decrease predictive accuracy even if the actual genetic risk factors remain the same.
This creates several important problems for the roll-out of such tests in the Canadian and UK healthcare environments. This is because approximately 22% of Canada's population and 12% of the UK are visible minorities. Further, these populations can have increased rates of health care utilization. Therefore, use of European-only PRSs could serve to worsen existing health disparities.
Canada's Chief Information Officer stated that, "Using Artificial Intelligence in government means balancing innovation with the ethical and responsible use of emerging technologies." Using PRSs developed to benefit only the majority group in our societies would not respect these directives and therefore our program will serve to ensure the responsible use of AI.
What are our Specific Goals?
1) To develop AI methods and open source software packages to improve the accuracy of PRSs for CHD and hyperlipidemia for individuals of non-European ancestry.
2) To compare the performance of these ancestry-adapted PRSs in individuals in a diverse set of cohorts, representing multiple ancestries in the UK and Canada.
The Team: Recruiting leaders across the UK and Canada, we have a depth of expertise in AI methods development, human genetics, clinical medicine, cohort development for minority groups, and statistical genetics. This gender-balanced team involves an appropriate mix of senior and junior investigators.
Relevance: Success in this AI-enabled program will allow for the transfer of recent advances in genomic medicine to the citizens of Canada and the UK, regardless of ancestry. The AI methods developed will be widely applicable to other PRSs in development by other groups. As genomics takes root in regular clinician-patient interactions, our set of AI-based tools will ensure that the benefit derived from these advances will be shared by all citizens of our countries.
Planned Impact
In this proposal, we aim to address a real-world clinical problem that if not resolved will impair the equitable uptake of AI-based genomics into clinical care. Given the urgency of this problem, we have incorporated into our program from its inception end-users who will directly benefit from this research.
Who will benefit from this research? If successful, our research will predominantly benefit people who are of non-European descent. To realize this benefit, we have engaged four end-users (see Letters of Support). Here we describe each end-user and their role in the program.
The Integrated Health & Social Services University Network for West-Central Montreal (known by its French acronym CIUSSS) provides clinical care to approximately 362,000 people who are served by a partnership of more than 30 healthcare facilities. Included is one of Montreal's leading hospitals and three specialized hospitals, five community care clinics, two rehabilitation centres, four residential centres, two long-term geriatric residences, and two day centres. Dr. Lawrence Rosenberg is the CEO of this CIUSSS and directly supports our program of research since the largest multicultural community in Québec is served by this CIUSSS. Dr. Rosenberg is currently leading a team of researchers and clinicians to implement genomics-based medicine within the CIUSSS.
Health Data Research UK is the UK's national Institute for health data science, directed by Professor Andrew Morris. It is an independent, non profit organisation bringing together 22 of the UK's leading universities and research institutions to address a common mission of uniting the UK's health data to make discoveries that improve people's lives. HDR UK's vision is that every healthcare interaction and research endeavour will be enhanced by access to large scale data and advanced analytics. The ambition to generate advanced AI tools to address health inequalities and collaborative research presented in this proposal fully aligns with HDR UK's mission.
Genomics England is owned and funded by the Department of Health & Social Care, England, set up to deliver the 100,000 Genomes Project. Its four main aims are to create an ethical and transparent genomics medicine programme based on patient consent; to bring benefit to patients and set up a genomic medicine service for the NHS; to enable new scientific discovery and medical insights; and to kick-start the development of a UK genomics industry. The 100,000 Genomes Project aims to bring the benefits of personalised medicine to the NHS. To make sure patients benefit from innovations in genomics and contribute towards delivering high quality care for all, now and for future generations. Involvement of Genomics England will provide an important route to further demonstrate the clinical utility of the equitable PRSs developed within this project.
Think Research represents our knowledge translation partner. The clinical impact of our research can only be realized if physicians understand when to order PRSs, how to interpret their results, document their findings and alter patient care. We are not naive to the tremendous difficulties encountered when attempting to change the behaviour of physicians. Therefore we have engaged with Think Research, a Canadian company that has specialized in electronic health record clinical support tools. Think Research's solutions are used in over 2,000 health care facilities world-wide. After completion of our program, our knowledge translation partner will maximize the clinical utility of our research by enabling uptake in health care systems in both Canada and the UK. Despite our engagement with Think Research, they will have no access to data, or the algorithms that were derived from data.
Summary: Through careful engagement with four essential end-users we will maximize the clinical impact of our research program.
Who will benefit from this research? If successful, our research will predominantly benefit people who are of non-European descent. To realize this benefit, we have engaged four end-users (see Letters of Support). Here we describe each end-user and their role in the program.
The Integrated Health & Social Services University Network for West-Central Montreal (known by its French acronym CIUSSS) provides clinical care to approximately 362,000 people who are served by a partnership of more than 30 healthcare facilities. Included is one of Montreal's leading hospitals and three specialized hospitals, five community care clinics, two rehabilitation centres, four residential centres, two long-term geriatric residences, and two day centres. Dr. Lawrence Rosenberg is the CEO of this CIUSSS and directly supports our program of research since the largest multicultural community in Québec is served by this CIUSSS. Dr. Rosenberg is currently leading a team of researchers and clinicians to implement genomics-based medicine within the CIUSSS.
Health Data Research UK is the UK's national Institute for health data science, directed by Professor Andrew Morris. It is an independent, non profit organisation bringing together 22 of the UK's leading universities and research institutions to address a common mission of uniting the UK's health data to make discoveries that improve people's lives. HDR UK's vision is that every healthcare interaction and research endeavour will be enhanced by access to large scale data and advanced analytics. The ambition to generate advanced AI tools to address health inequalities and collaborative research presented in this proposal fully aligns with HDR UK's mission.
Genomics England is owned and funded by the Department of Health & Social Care, England, set up to deliver the 100,000 Genomes Project. Its four main aims are to create an ethical and transparent genomics medicine programme based on patient consent; to bring benefit to patients and set up a genomic medicine service for the NHS; to enable new scientific discovery and medical insights; and to kick-start the development of a UK genomics industry. The 100,000 Genomes Project aims to bring the benefits of personalised medicine to the NHS. To make sure patients benefit from innovations in genomics and contribute towards delivering high quality care for all, now and for future generations. Involvement of Genomics England will provide an important route to further demonstrate the clinical utility of the equitable PRSs developed within this project.
Think Research represents our knowledge translation partner. The clinical impact of our research can only be realized if physicians understand when to order PRSs, how to interpret their results, document their findings and alter patient care. We are not naive to the tremendous difficulties encountered when attempting to change the behaviour of physicians. Therefore we have engaged with Think Research, a Canadian company that has specialized in electronic health record clinical support tools. Think Research's solutions are used in over 2,000 health care facilities world-wide. After completion of our program, our knowledge translation partner will maximize the clinical utility of our research by enabling uptake in health care systems in both Canada and the UK. Despite our engagement with Think Research, they will have no access to data, or the algorithms that were derived from data.
Summary: Through careful engagement with four essential end-users we will maximize the clinical impact of our research program.
Publications
Chung R
(2023)
Using Polygenic Risk Scores for Prioritizing Individuals at Greatest Need of a Cardiovascular Disease Risk Assessment.
in Journal of the American Heart Association
Emerging Risk Factors Collaboration
(2023)
Life expectancy associated with different ages at diagnosis of type 2 diabetes in high-income countries: 23 million person-years of observation.
in The lancet. Diabetes & endocrinology
Fachrul M
(2023)
Direct inference and control of genetic population structure from RNA sequencing data.
in Communications biology
Foguet C
(2022)
Genetically personalised organ-specific metabolic models in health and disease.
in Nature communications
Foguet Coll C
(2022)
Genetically personalised organ-specific metabolic models in health and disease
Gaziano L
(2022)
Mild-to-Moderate Kidney Dysfunction and Cardiovascular Disease: Observational and Mendelian Randomization Analyses
in Circulation
Grealey J
(2022)
The Carbon Footprint of Bioinformatics.
in Molecular biology and evolution
| Description | Polygenic risk scores (PRSs) aggregate the many small effects of alleles across the human genome to estimate the risk of a disease or disease-related trait for an individual. The potential benefits of PRSs include cost-effective enhancement of primary disease prevention, more refined diagnoses and improved precision when prescribing medicines. However, these must be weighed against the potential risks, such as uncertainties and biases in PRS performance, as well as potential misunderstanding and misuse of these within medical practice and in wider society. By addressing key issues including gaps in best practices, risk communication and regulatory frameworks, PRSs can be used responsibly to improve human health. Here, the International Common Disease Alliance's PRS Task Force, a multidisciplinary group comprising expertise in genetics, law, ethics, behavioral science and more, highlights recent research to provide a comprehensive summary of the state of polygenic score research, as well as the needs and challenges as PRSs move closer to widespread use in the clinic. Genetic association studies for blood cell traits, which are key indicators of health and immune function, have identified several hundred associations and defined a complex polygenic architecture. Polygenic scores (PGSs) for blood cell traits have potential clinical utility in disease risk prediction and prevention, but designing PGS remains challenging and the optimal methods are unclear. To address this, we evaluated the relative performance of 6 methods to develop PGS for 26 blood cell traits, including a standard method of pruning and thresholding (P + T) and 5 learning methods: LDpred2, elastic net (EN), Bayesian ridge (BR), multilayer perceptron (MLP) and convolutional neural network (CNN). We evaluated these optimized PGSs on blood cell trait data from UK Biobank and INTERVAL. We find that PGSs designed using common machine learning methods EN and BR show improved prediction of blood cell traits and consistently outperform other methods. Our analyses suggest EN/BR as the top choices for PGS construction, showing improved performance for 25 blood cell traits in the external validation, with correlations with the directly measured traits increasing by 10%-23%. Ten PGSs showed significant statistical interaction with sex, and sex-specific PGS stratification showed that all of them had substantial variation in the trajectories of blood cell traits with age. Genetic correlations between the PGSs for blood cell traits and common human diseases identified well-known as well as new associations. We develop machine learning-optimized PGS for blood cell traits, demonstrate their relationships with sex, age, and disease, and make these publicly available as a resource. We have developed polygenic scores for blood cell traits, multi-omic traits, type 2 diabetes and coronary artery disease and demonstrated their predictive capacity and (where possible) their downstream aetiological effects. We have used sophisticated analytics, including AI and machine learning, to create polygenic scores which have improved equitability in performance across genetic ancestries. These scores are now in the process of being translated and have been incorporated into clinically validated pipelines. |
| Exploitation Route | As in the description polygenic scores are being translated into healthcare in primary disease prevention, more refined diagnoses and improved precision when prescribing medicines. |
| Sectors | Healthcare |
| Description | Promoted green computing and educated scientists about it |
| Geographic Reach | National |
| Policy Influence Type | Influenced training of practitioners or researchers |
| Impact | no changes yet |
| Description | Requirement by the French DEFRA to use our Green Algorithms calculator to apply to one of their AI funding calls. Loic Lannelongue |
| Geographic Reach | National |
| Policy Influence Type | Influenced training of practitioners or researchers |
| Impact | The French Department of the Environment released a €40m funding call for AI projects for sustainable local communities. As part of their application, candidates had to use our Green Algorithms calculator to quantify the expected environmental impact of the proposal. I took part in a webinar organised by the Department to present the tool. |
| URL | https://www.banquedesterritoires.fr/demonstrateurs-dia-frugale-au-service-de-la-transition-ecologiqu... |
| Description | Innovaton Fund - HDR UK |
| Amount | £381,000 (GBP) |
| Organisation | Health Data Research UK |
| Sector | Charity/Non Profit |
| Country | United Kingdom |
| Start | 05/2022 |
| End | 03/2023 |
| Title | 26 polygenic scores |
| Description | Produced 26 polygenic scores for different human blood cell traits (e.g. white blood cell count etc). The models for these polygenic scores have been made freely available via the Polygenic Score Catalog |
| Type Of Material | Data analysis technique |
| Year Produced | 2021 |
| Provided To Others? | Yes |
| Impact | None yet |
| URL | https://www.pgscatalog.org/publication/PGP000051/). |
| Title | OMICSPRED: An atlas of genetic scores for prediction of multi-omics data |
| Description | OMICSPRED is a resource for predicting multi-omics data (proteomics, metabolomics, transcriptomics etc.) directly from genotypes. To do this, we used extensive multi-omics data to train genetic scores using machine learning that can be used to predict omics-traits on your own samples. We provide access to these multi-omics genetic scores on the OMICSPRED website, where you can explore and download the genetic scores for a wide range of biomolecular traits in human blood as well as the summary statistics of their associations with key traits and diseases in the UK Biobank. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2021 |
| Provided To Others? | Yes |
| Impact | The OMICSPRED website is open and accessible, and visited by 588 users per month on average over the last year (February 2023-Feburary 2024). Over the same time period there have been 1523 downloads of the OMICSPRED scores from our file server, and there has been one preprint (Drozd et al. medRxiv 2024) |
| URL | https://www.omicspred.org |
| Title | UK Biobank Nightingale metabolomics data |
| Description | A quality controlled and unwanted variation normalised version of the UK Biobank Nightingale Health metabolomics data |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | Dataset and methods provided to the wider research community. |
| URL | https://github.com/sritchie73/ukbnmr/ |
| Description | Sanger Open Targets |
| Organisation | Genome Research Ltd |
| Country | United Kingdom |
| Sector | Charity/Non Profit |
| PI Contribution | Financial support for a 0.2FTE Data Manager |
| Collaborator Contribution | Provision of a data manager |
| Impact | None yet |
| Start Year | 2021 |
| Description | Behind the Curtains theatre production on polygenic scores |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Public/other audiences |
| Results and Impact | Behind the Curtains theatre production on polygenic scores (10min), produced by Creative Encounters |
| Year(s) Of Engagement Activity | 2021 |
| Description | Using health data to understand the causes of disease to improve health care. |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Public/other audiences |
| Results and Impact | A 1 hour talk with showcasing the work of two early career research written specfically for members of the public as part of the Cambridge Festival 2023. The purpose was to explain how the huge amount of health data taken from electronic health records can be anaylsed using complex statistical models can help us understand why some people are at risk of diseases, e.g. to investigate the association of COVID-19 with cardiovascular disease. Explaining clearly how genetic risk is not a straight line to clinical risk in many diseases and why that might be. Looking at the limitations of the data and what could be better. |
| Year(s) Of Engagement Activity | 2023 |
| Description | public workshop on polygenic risk scores |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Public/other audiences |
| Results and Impact | Organised a public workshop on polygenic risk scores: what they are, how are they created and how useful might they be. |
| Year(s) Of Engagement Activity | 2021 |
