Maximising the potential of wild-derived laboratory mouse strains for medical research
Lead Research Organisation:
Wellcome Sanger Institute
Department Name: Computational Genomics
Abstract
Our key aim is to explore the relationship between genetic and medically relevant human disease phenotypes. One way to do this is to assess the genetic differences between long-established laboratory mouse strains. Wild-derived mouse strains display many important disease phenotypes such as resistance to various forms of cancer (e.g. liver, lung, and skin cancer), bacterial and viral infection. However, the foundation for studying the genetic differences in these strains is having accurate genome sequences. In this project, we will first generate genome sequences for the most commonly used wild-derived mouse strains and then use these sequences and knowledge of the gene structures to determine the genetic cause of observed phenotypic differences between these strains. By combining sequence and phenotypic data we will determine whether sequence variants are likely to be contributing to disease susceptibility.
Technical Summary
The 1.5-2M years of genetic divergence between M.m domesticus and the wild derived strains (SPRET/EiJ, PWK/PhJ, CAST/EiJ, and WSB/EiJ) makes it difficult to use standard methods of mapping back to the C57B6/J reference. We have observed 4-8 times more SNPs, short indels and larger structural variants (>100bp) in the wild derived strains so it is not appropriate to use the C57B6/J reference genome as a comparator for these strains. If we consider just protein coding regions of the genome alone, we can see that in Mus spretus just under 3,000 large structural variants occur in these regions and so are likely to result in significantly altered gene models. This problem poses a significant difficulty for any outbred crossing based projects (such the CC and the Diversity Outbred Cross (DO)) where a subset of or interspecific crosses where the founder strains are wild-derived. Without near contiguous sequences assemblies (at least in protein coding regions), we currently cannot accurately predict gene modes and hence predict function of polymorphism.
The task of generating accurate whole-genome draft assemblies of eukaryotic genomes from second-generation sequencing data remains challenging. The key ingredients required to generate high quality draft genome sequences of large eukaryotes are high sequencing depth from short fragment (200-600bp) paired-end sequencing libraries, lower depth from a range of large mate-pair fragment libraries (3-40Kbp), and a optical or physical map. This strategy has been used recently to produce several draft genome sequences for the goat, hamster, and gorilla.
The task of generating accurate whole-genome draft assemblies of eukaryotic genomes from second-generation sequencing data remains challenging. The key ingredients required to generate high quality draft genome sequences of large eukaryotes are high sequencing depth from short fragment (200-600bp) paired-end sequencing libraries, lower depth from a range of large mate-pair fragment libraries (3-40Kbp), and a optical or physical map. This strategy has been used recently to produce several draft genome sequences for the goat, hamster, and gorilla.
Planned Impact
The most obvious beneficiary of these genome sequences and annotation generated will be the mouse genetics community involved in mapping complex disease related traits, researchers mapping mutations in crosses involving the wild-derived strains and crosses attempting to identify modifiers of mutations. The wild-derived mouse strains are founder strains for two of the largest mouse recombinant inbred strain experiments, the Collaborative Cross and the Diversity Outbred Cross, with the numbers of DO mice that have been distributed by the Jackson Laboratory numbering into the thousands now. These lines have already been successfully used to investigate disease resistance and identify novel QTLs for fungal and viral infection response with many more disease resistance experiments expected to be carried out in the coming years.
The groups described above will benefit from this research by having a completely open access to the genome sequences of the wild-derived mouse strains and perhaps most importantly the strain specific gene annotation. This will mean that groups involved in mapping disease phenotypes in wild-derived mouse strains will be able to identify the host resistance genes involved in the phenotype as the first step to elucidating pathway information.
The groups described above will benefit from this research by having a completely open access to the genome sequences of the wild-derived mouse strains and perhaps most importantly the strain specific gene annotation. This will mean that groups involved in mapping disease phenotypes in wild-derived mouse strains will be able to identify the host resistance genes involved in the phenotype as the first step to elucidating pathway information.
Publications
Thomas M
(2019)
Collateral damage and CRISPR genome editing.
in PLoS genetics
Nicod J
(2016)
Genome-wide association of multiple complex traits in outbred mice by ultra-low-coverage sequencing
in Nature Genetics
Morgan AP
(2017)
Structural Variation Shapes the Landscape of Recombination in Mouse.
in Genetics
Morgan AP
(2016)
Whole Genome Sequence of Two Wild-Derived Mus musculus domesticus Inbred Strains, LEWES/EiJ and ZALENDE/EiJ, with Different Diploid Numbers.
in G3 (Bethesda, Md.)
Lilue J
(2019)
Mouse protein coding diversity: What's left to discover?
in PLoS genetics
Lilue J
(2018)
Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.
in Nature genetics
Lewis MA
(2022)
Identification and characterisation of spontaneous mutations causing deafness from a targeted knockout programme.
in BMC biology
Keane TM
(2014)
Identification of structural variation in mouse genomes.
in Frontiers in genetics
Description | PHENOMIN Scientific Advisory Board |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Impact | Better animal care in a research setting |
Description | Maximising the potential of laboratory mice for understanding the genetic basis for disease |
Amount | £685,000 (GBP) |
Funding ID | MR/R017565/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2018 |
End | 04/2021 |
Title | Mouse Genomes Project database |
Description | A catalogue of mouse strain sequences |
Type Of Material | Database/Collection of data |
Year Produced | 2010 |
Provided To Others? | Yes |
Impact | Thousands of users and hundreds of papers |
URL | http://www.sanger.ac.uk/resources/mouse/genomes/ |
Title | Mouse genomes in ensembl |
Description | Genome sequences and genome annotation for sixteen laboratory mouse genomes for free use by the public. |
Type Of Material | Database/Collection of data |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | Availability for the wider mouse genetics community of whole genome sequences to reduce the number of laboratory animals required for experiments. |
URL | http://www.ensembl.org/Mus_musculus/Info/Strains?db=core |
Title | UCSC Genome Browser |
Description | Mouse genomes in UCSC Genome Browser |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | Public availability of first draft genome sequences for 18 laboratory mouse strains |
URL | https://genome.ucsc.edu/ |
Description | Ensembl Genome Browser |
Organisation | EMBL European Bioinformatics Institute (EMBL - EBI) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | In this project, we produced genome assemblies for the most widely used laboratory mouse strains. These genomes are now available to the wider community via the Ensembl Genome Browser. |
Collaborator Contribution | Ensembl has provided services for hosting and presenting the genome sequences to the wider research community. This provides ongoing long term sustainability for the data. |
Impact | Increased usage of the data Long term sustainability and availability of the data |
Start Year | 2018 |
Description | Jackson Laboratory |
Organisation | The Jackson Laboratory |
Country | United States |
Sector | Charity/Non Profit |
PI Contribution | Genome sequencing, genome assembly, and gene prediction. |
Collaborator Contribution | Supply of key samples to complete the research. Collaboration on data analysis and interpretation. |
Impact | First draft genome sequences for laboratory mouse genomes, a key resource for all mouse genetics research. |
Start Year | 2014 |
Description | UCSC genome annotation |
Organisation | University of California, Santa Cruz |
Country | United States |
Sector | Academic/University |
PI Contribution | Genome sequencing, and genome assemblies for sixteen inbred laboratory mouse genomes. |
Collaborator Contribution | Personnel and IT resources to complete whole-genome annotation of sixteen inbred laboratory mouse genomes. |
Impact | Whole-genome annotation of sixteen inbred laboratory mouse genomes. |
Start Year | 2015 |
Description | Conference of the International Mammalian Genome Society 2015 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Third sector organisations |
Results and Impact | A research talk "Multiple mouse reference genomes and strain specific gene" describing the public resource being generated through this work. |
Year(s) Of Engagement Activity | 2015 |
URL | http://www.imgc2015.jp/ |
Description | Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation, and homozygous truncating mutations |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | A talk by Dr. Anthony Doran at The Allied Genetics Conference 2016. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.genetics2016.org |
Description | Discovery, assembly, and annotation of subspecies specific haplotypes in classical and wild-derived mouse strains |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Third sector organisations |
Results and Impact | A talk at The Allied Genetics Conference 2016 |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.genetics2016.org |
Description | Invited seminar at Jackson Labs |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Leaders in the field ~400. Inform them about our work and the progress made. New users and collaborations |
Year(s) Of Engagement Activity | 2014 |
Description | Multiple mouse reference genomes defines subspecies specific haplotypes and novel coding sequences |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Talk at international research conference |
Year(s) Of Engagement Activity | 2017 |
URL | http://imgs.org/ |
Description | Talk: Discovery, assembly, and annotation of subspecies specific haplotypes in classical and wild-derived mouse strains |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | A talk at The Allied Genetics Conference 2016 by Thomas Keane |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.genetics2016.org/ |