Assessing the pathogenicity, penetrance and expressivity of monogenic disease variants using large-scale population-based cohorts

Lead Research Organisation: University of Exeter
Department Name: Institute of Biomed & Clinical Science


Interpreting the medical consequences of genetic variants in individuals is currently extremely challenging. Incorrect interpretation leads to massive overdiagnosis of genetic conditions, resulting in inappropriate treatment of individuals and increased healthcare costs due to unnecessary follow-on tests. Unfortunately, inaccurate genetic variant interpretation is a critical and growing problem because whole genome sequencing is becoming widespread throughout biomedical science and clinical medicine, replacing the standard clinical "disease-first" approach to diagnosis with a faster but less specific "DNA-first" approach. In addition, there has been a substantial increase in direct-to-consumer genetic testing resulting in numerous errors with major clinical implications. We aim to improve the interpretation of rare genetic variants by harnessing a uniquely powerful combination of newly available high-quality genetic data coupled with detailed clinical results on over half a million individuals.

There are three main reasons for the incorrect interpretation of genetic variants, caused by historical gaps in the evidence base. First, many genetic variants that have been claimed to cause rare genetic diseases do not, often because the original evidence is now outdated and the variants have since been shown to be too common in the population to cause disease. Second, variants that cause inherited genetic diseases are identified by studying highly-selected, small groups of patients and families with a specific condition; this leads to the conclusion that every individual with the variant will get the condition, which in many cases is untrue. Third, the highly selected nature of the original discovery cohorts means that the complete set of disease symptoms caused by a particular genetic variant is unknown, and can be biased by family history and confounded by other familial diseases.

We aim to address this evidence-gap by using newly available large-scale genome-wide sequencing datasets. We will focus on two examples of diseases caused by single rare genetic variants in one of hundreds of specific genes, where we have specific expertise and access to appropriate large-scale disease cohorts. We will compare the prevalence of disease-causing variants in these cohorts to that in a large-scale population cohort. Specifically, we will use datasets from UK Biobank (~500,000 participants), the Exeter-based monogenic diabetes cohort (~15,000 cases), and the UK-wide Deciphering Developmental Disorders Study (~13,500 cases). This enormous collection of high-resolution genetic data coupled with detailed clinical information is unparalleled and uniquely powerful. We will include evaluation of all rare variants linked with these genetic diseases, from the smallest (single base) to the largest (whole chromosome) changes. Based on our prior work, we anticipate producing robust estimates of how likely an individual with a particular disease-causing variant is to develop disease, and to expand and refine the disease symptoms associated with many rare genetic variants. We also expect to refute previous erroneous genetic causes of disease in the literature. Finally, we will test the hypothesis that differences in common genetic factors between the cohorts are responsible for disparities in disease occurrence and severity.

This work will inform genetic variant interpretation in the clinic, reduce genetic overdiagnosis particularly from incidental findings, and facilitate the implementation of precision medicine. Our findings will have a direct impact on patients and families affected by genetic diseases, as well as members of the public undergoing genetic testing, and will provide novel insights about the nature of monogenic disease.

Technical Summary

Combining genomic sequencing data from population and disease cohorts, this study will:
1. Critically evaluate which published genes in monogenic diabetes and developmental disorders should be reported as disease causing.
2. Generate minimum and age-dependent penetrance estimates for pathogenic genes.
3. Determine the phenotypic spectra for known pathogenic genetic variants in diabetes and developmental disorders.
4. Test the hypothesis that people in UK Biobank with confirmed monogenic variants have reduced penetrance due to a favourable genetic background.

To achieve these objectives, we will:
1. Evaluate the relative enrichment or depletion of known and likely pathogenic SNVs, indels and CNVs in known monogenic disease genes (e.g. loss-of-function variants in monoallelic haploinsufficient developmental disorder genes) in UK Biobank compared with our two clinical cohorts (the DDD Study and Exeter monogenic diabetes study). We will perform extensive quality control of variants to evaluate analytical validity, including cross-platform comparisons, and also investigate potential somatic mosaicism in UK Biobank.
2. For confirmed pathogenic variants present in individuals in UK Biobank, we will evaluate the penetrance of the disease (based on phenotypes, age at recruitment, age of diagnosis where available, and affected status of parents). We will also evaluate age-related co-morbidities (e.g. frailty, dementia, CHD, etc.) and reduced penetrance of developmental phenotypes with age (e.g. educational attainment, fluid intelligence, height, BMI, etc).
3. We will construct relevant genome risk scores for e.g. T1D, T2D, height, BMI, intelligence, educational attainment, schizophrenia, etc. and evaluate whether individuals in UK Biobank, who have pathogenic monogenic variants but appear to have reduced penetrance or mild forms of disease, have a favourable genomic risk score that protects them from manifesting more severe forms of disease

Planned Impact

Impact on patients: Individual patients and families with monogenic diseases will potentially benefit from this project in a number of different ways. Firstly, we will refute some erroneous disease-gene associations, enabling misdiagnosed patients to get more accurate diagnoses in future. Removing a misdiagnosis is particularly important where a molecular diagnosis alters treatment - for example, 76% of patients change treatment when a genetic diagnosis of monogenic diabetes is made, and the majority improve their glucose control. Secondly, we will provide minimum penetrance estimates for a number of variants and investigate genetic modifiers, which is important for testing and counselling family members. Thirdly, we will investigate the full spectrum of clinical features associated with specific disease genes and investigate age-specific effects, improving the prognostic advice that can be provided to patients and families. The project will improve interpretation of genetic test results not only in families with diabetes or developmental disorders, but also in patients undergoing genome sequencing for other conditions where incidental or secondary findings will be reported.

Impact on the public: More than 12 million people are now thought to have taken a direct-to-consumer genetic test, with market forecasting firm Credence Research predicting a compound annual growth rate of nearly 20% over the next decade. The results of these tests can be misleading and inaccurate in a number of ways, including by informing consumers of their risks of developing genetic conditions based on inappropriately high penetrance estimates that are derived from high-risk family-based studies. This can result in emotional distress, additional medical investigations and sometimes inappropriate treatment for conditions that are unlikely to occur in low-risk individuals. This project will help to reduce these inaccuracies by providing minimum penetrance estimates that are relevant to the direct-to-consumer market.

Impact on the NHS: By comparing traits associated with disease genes across two clinically important disease groups with an unselected population cohort, this project will improve clinical genetic variant interpretation, reduce the uncertainty associated with predictive genetic testing and reduce genetic overdiagnosis. By providing an evidence-based framework for critically investigating genetic variants that cause disease in an unselected population, the project will facilitate the implementation of precision medicine throughout the NHS. Since several of the co-applicants work directly with the South West Genomic Medicine Centre, results can be directly and immediately translated into clinical practice.

Impact on public health and the public purse: Incorrectly classifying benign incidental findings from genome sequencing and investigating erroneous direct-to-consumer results can be extremely expensive to the NHS through wasted consultations, unnecessary follow-on tests, and inappropriate and potentially harmful treatments. In addition, incorrect medical information from genetic testing is not only damaging to the public's health, but also undermines the public's trust in medical professionals and the potential of genetics to improve health, potentially preventing the benefits of precision medicine from being realised.

Impact on UK science and informatics capacity: This project brings together individuals with expertise in rare and common diseases, genetics, genomics and big data and will represent a big step towards increasing the goal of the expansion of research in genomic medicine at the University of Exeter. Through teaching on undergraduate and MSc courses, this project will provide exposure to our research and skills to talented students in these areas and serve as a platform for recruitment of the future generation of biological scientists.
Description ASHG session 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Organised and presented at (virtual) session on penetrance at American Society of Human Genetics
Year(s) Of Engagement Activity 2020
Description BSGM conference 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation on penetrance in UK Biobank at national conference
Year(s) Of Engagement Activity 2021
Description Curating Clinical Genome conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Three members of the team involved in presenting about penetrance (session chair, invited speaker and selected spoken abstract)
Year(s) Of Engagement Activity 2021