Maximising the potential of wild-derived laboratory mouse strains for medical research

Lead Research Organisation: Wellcome Sanger Institute
Department Name: Computational Genomics

Abstract

Our key aim is to explore the relationship between genetic and medically relevant human disease phenotypes. One way to do this is to assess the genetic differences between long-established laboratory mouse strains. Wild-derived mouse strains display many important disease phenotypes such as resistance to various forms of cancer (e.g. liver, lung, and skin cancer), bacterial and viral infection. However, the foundation for studying the genetic differences in these strains is having accurate genome sequences. In this project, we will first generate genome sequences for the most commonly used wild-derived mouse strains and then use these sequences and knowledge of the gene structures to determine the genetic cause of observed phenotypic differences between these strains. By combining sequence and phenotypic data we will determine whether sequence variants are likely to be contributing to disease susceptibility.

Technical Summary

The 1.5-2M years of genetic divergence between M.m domesticus and the wild derived strains (SPRET/EiJ, PWK/PhJ, CAST/EiJ, and WSB/EiJ) makes it difficult to use standard methods of mapping back to the C57B6/J reference. We have observed 4-8 times more SNPs, short indels and larger structural variants (>100bp) in the wild derived strains so it is not appropriate to use the C57B6/J reference genome as a comparator for these strains. If we consider just protein coding regions of the genome alone, we can see that in Mus spretus just under 3,000 large structural variants occur in these regions and so are likely to result in significantly altered gene models. This problem poses a significant difficulty for any outbred crossing based projects (such the CC and the Diversity Outbred Cross (DO)) where a subset of or interspecific crosses where the founder strains are wild-derived. Without near contiguous sequences assemblies (at least in protein coding regions), we currently cannot accurately predict gene modes and hence predict function of polymorphism.

The task of generating accurate whole-genome draft assemblies of eukaryotic genomes from second-generation sequencing data remains challenging. The key ingredients required to generate high quality draft genome sequences of large eukaryotes are high sequencing depth from short fragment (200-600bp) paired-end sequencing libraries, lower depth from a range of large mate-pair fragment libraries (3-40Kbp), and a optical or physical map. This strategy has been used recently to produce several draft genome sequences for the goat, hamster, and gorilla.

Planned Impact

The most obvious beneficiary of these genome sequences and annotation generated will be the mouse genetics community involved in mapping complex disease related traits, researchers mapping mutations in crosses involving the wild-derived strains and crosses attempting to identify modifiers of mutations. The wild-derived mouse strains are founder strains for two of the largest mouse recombinant inbred strain experiments, the Collaborative Cross and the Diversity Outbred Cross, with the numbers of DO mice that have been distributed by the Jackson Laboratory numbering into the thousands now. These lines have already been successfully used to investigate disease resistance and identify novel QTLs for fungal and viral infection response with many more disease resistance experiments expected to be carried out in the coming years.

The groups described above will benefit from this research by having a completely open access to the genome sequences of the wild-derived mouse strains and perhaps most importantly the strain specific gene annotation. This will mean that groups involved in mapping disease phenotypes in wild-derived mouse strains will be able to identify the host resistance genes involved in the phenotype as the first step to elucidating pathway information.

Publications

10 25 50
 
Description PHENOMIN Scientific Advisory Board
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
Impact Better animal care in a research setting
 
Description Maximising the potential of laboratory mice for understanding the genetic basis for disease
Amount £685,000 (GBP)
Funding ID MR/R017565/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 04/2018 
End 04/2021
 
Title Mouse Genomes Project database 
Description A catalogue of mouse strain sequences 
Type Of Material Database/Collection of data 
Year Produced 2010 
Provided To Others? Yes  
Impact Thousands of users and hundreds of papers 
URL http://www.sanger.ac.uk/resources/mouse/genomes/
 
Title Mouse genomes in ensembl 
Description Genome sequences and genome annotation for sixteen laboratory mouse genomes for free use by the public. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact Availability for the wider mouse genetics community of whole genome sequences to reduce the number of laboratory animals required for experiments. 
URL http://www.ensembl.org/Mus_musculus/Info/Strains?db=core
 
Title UCSC Genome Browser 
Description Mouse genomes in UCSC Genome Browser 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Public availability of first draft genome sequences for 18 laboratory mouse strains 
URL https://genome.ucsc.edu/
 
Description Ensembl Genome Browser 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution In this project, we produced genome assemblies for the most widely used laboratory mouse strains. These genomes are now available to the wider community via the Ensembl Genome Browser.
Collaborator Contribution Ensembl has provided services for hosting and presenting the genome sequences to the wider research community. This provides ongoing long term sustainability for the data.
Impact Increased usage of the data Long term sustainability and availability of the data
Start Year 2018
 
Description Jackson Laboratory 
Organisation The Jackson Laboratory
Country United States 
Sector Charity/Non Profit 
PI Contribution Genome sequencing, genome assembly, and gene prediction.
Collaborator Contribution Supply of key samples to complete the research. Collaboration on data analysis and interpretation.
Impact First draft genome sequences for laboratory mouse genomes, a key resource for all mouse genetics research.
Start Year 2014
 
Description UCSC genome annotation 
Organisation University of California, Santa Cruz
Country United States 
Sector Academic/University 
PI Contribution Genome sequencing, and genome assemblies for sixteen inbred laboratory mouse genomes.
Collaborator Contribution Personnel and IT resources to complete whole-genome annotation of sixteen inbred laboratory mouse genomes.
Impact Whole-genome annotation of sixteen inbred laboratory mouse genomes.
Start Year 2015
 
Description Conference of the International Mammalian Genome Society 2015 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Third sector organisations
Results and Impact A research talk "Multiple mouse reference genomes and strain specific gene" describing the public resource being generated through this work.
Year(s) Of Engagement Activity 2015
URL http://www.imgc2015.jp/
 
Description Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation, and homozygous truncating mutations 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact A talk by Dr. Anthony Doran at The Allied Genetics Conference 2016.
Year(s) Of Engagement Activity 2016
URL http://www.genetics2016.org
 
Description Discovery, assembly, and annotation of subspecies specific haplotypes in classical and wild-derived mouse strains 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Third sector organisations
Results and Impact A talk at The Allied Genetics Conference 2016
Year(s) Of Engagement Activity 2016
URL http://www.genetics2016.org
 
Description Invited seminar at Jackson Labs 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Leaders in the field ~400. Inform them about our work and the progress made.

New users and collaborations
Year(s) Of Engagement Activity 2014
 
Description Multiple mouse reference genomes defines subspecies specific haplotypes and novel coding sequences 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Talk at international research conference
Year(s) Of Engagement Activity 2017
URL http://imgs.org/
 
Description Talk: Discovery, assembly, and annotation of subspecies specific haplotypes in classical and wild-derived mouse strains 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact A talk at The Allied Genetics Conference 2016 by Thomas Keane
Year(s) Of Engagement Activity 2016
URL http://www.genetics2016.org/