Maximising the potential of laboratory mice for understanding the genetic basis for disease

Lead Research Organisation: European Bioinformatics Institute
Department Name: OMICs

Abstract

Many of the most important medical discoveries in the last century have been made by studying model organisms such as the laboratory mouse. Mice were key to discovering the antibiotic effects of penicillin, the basis for how the immune system recognises virus infected cells, and are used to develop and test almost every new drug. Laboratory mouse strains display many important disease phenotypes such as resistance to various forms of cancer (e.g. liver, lung, and skin cancer), bacterial, and viral infection and are used as models for many human diseases. The foundation for studying the genetic differences in these strains is having accurate genome sequences. Currently, only one mouse strain (C57BL/6J) has a genome sequence with sufficient accuracy to study genes involved in disease and immunity. In this project, we will upgrade the genome sequences of 16 other laboratory mouse strains to be reference quality using third generation sequencing technologies. This will form the basis for studying the genes that are known to be involved in the differential response to disease and immune challenges. This will enable researchers using these strains to combine sequence and phenotypic data we will determine whether sequence variants are likely to be contributing to disease susceptibility. All of the genome data generated by the project will be made freely available so that researchers worldwide can make maximum use of these genomes.

Technical Summary

Mouse models have long been used to gain a greater understanding of human disease genetics. In mouse genetics, decoding the genome of one strain (C57BL/6J) had a profound impact on our ability to relate sequence to function. There are more than two dozen different laboratory mouse strains used as models for human disease. In the Mouse Genomes Project, we have sequenced the genomes of 36 mouse strains to produce the first comprehensive public catalogs of genetic variation (single nucleotide, short insertion/deletions, and large multi-kilobase copy number changes or rearrangements), and studied its effect on phenotypes. This public resource has been invaluable to the mouse genetics community, and has been cited over 1,200 times since 2011. In 2016, we completed the first draft assembled genome sequences for these strains using second generation short-read sequencing technologies and long range libraries. We have identified over 5,144 loci in the mouse genomes where there is evidence that more complex strain specific alleles or haplotypes are present. These loci are enriched for genes involved in immunity, sensory, and kin recognition functions. These regions are enriched for highly identical repeat sequences which likely promotes recombination at these loci, and therefore pose a particular challenge for genome assembly. We propose to use long read technologies to create reference quality assemblies and genome annotation for 16 mouse strains, and develop and apply a set of quality control measures to assess the accuracy of the strain specific alleles. We will use the genome sequences to study the genome structure of these complex loci that have already been associated with disease phenotypes in QTLs from genetic reference panels such as the Collaborative Cross and Diversity Outbred Cross. All data from this project will be freely available via the Ensembl and UCSC genome browsers and the Mouse Genome Informatics database at the Jackson Laboratory.

Planned Impact

Laboratory mice have been key to many of the most important medical discoveries such antibiotic effects of penicillin, the basis for how the immune system recognises virus infected cells, and are used to develop and test almost every new drug. Laboratory mouse strains display many important disease response phenotypes such as resistance to various forms of cancer (e.g. liver, lung, and skin cancer), bacterial and viral infection, and are used as models to study many human diseases. The foundation for studying the genetic differences in these strains is having accurate genome sequences. Currently, only one mouse strain (C57BL/6J) has a genome sequence with sufficient accuracy to study genes involved in disease and immunity. This project will create high quality genome sequences for sixteen commonly used laboratory mouse strains. All of the raw and processed data will be deposited in appropriate public databases; European Nucleotide Archive - raw reads, European Variation Archive - genetic variation; Ensembl and UCSC genome browsers - visualisation of the assembled genomes and annotation. Therefore the impact from this project will be felt immediately by the wider mouse genetics community who are using these strains to study a wide range of human diseases. The genomes will provide the genetic basis for the pharmaceutical industry to understand differential responses to new drugs. Accurate genome sequences will reduce the numbers of mice currently required by mouse genetic laboratories.
 
Title Mouse genomes in ensembl 
Description Genome sequences and genome annotation for sixteen laboratory mouse genomes for free use by the public. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact Availability for the wider mouse genetics community of whole genome sequences to reduce the number of laboratory animals required for experiments. 
URL http://www.ensembl.org/Mus_musculus/Info/Strains?db=core
 
Description Ensembl Genome Browser 
Organisation Ensembl
Country United Kingdom 
Sector Academic/University 
PI Contribution We contributed the first draft genome sequences for 16 laboratory mouse genomes.
Collaborator Contribution The Ensembl group has made these genomes publicly available to the research community.
Impact Widespread availability of the genomes.
Start Year 2016
 
Description Jackson Laboratory 
Organisation The Jackson Laboratory
Country United States 
Sector Charity/Non Profit 
PI Contribution Genome sequencing, genome assembly, and gene prediction.
Collaborator Contribution Supply of key samples to complete the research. Collaboration on data analysis and interpretation.
Impact First draft genome sequences for laboratory mouse genomes, a key resource for all mouse genetics research.
Start Year 2008
 
Description Lactation 
Organisation Baylor College of Medicine
Country United States 
Sector Hospitals 
PI Contribution Bioinformatics and data science support.
Collaborator Contribution Partner has provided genome sequencing data for new strains of mouse, and collaboration on genome analysis and interpretation of the data.
Impact Poster presentation at international conference, International Mammalian Genome Society 2019.
Start Year 2017
 
Description UCSC Genome Browser 
Organisation University of California, Santa Cruz
Country United States 
Sector Academic/University 
PI Contribution Provided draft genome sequences and annotation.
Collaborator Contribution Hosting and public availability of the genome sequences to the wider scientific community.
Impact Genomes are available to the public.
Start Year 2016
 
Description ANU Talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited talk entitled; Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci
Year(s) Of Engagement Activity 2019
 
Description Harwell Talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Invited talk entitled: Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci
Year(s) Of Engagement Activity 2019
 
Description IMGC 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Third sector organisations
Results and Impact Presentation at international conference on Mammalian Genetics. Expected outcome will be increased re-use and sustainability of the genome data generated by this award.
Year(s) Of Engagement Activity 2019
URL https://imgc2019.sciencesconf.org/