Maximising the potential of laboratory mice for understanding the genetic basis for disease
Lead Research Organisation:
European Bioinformatics Institute
Department Name: OMICs
Abstract
Many of the most important medical discoveries in the last century have been made by studying model organisms such as the laboratory mouse. Mice were key to discovering the antibiotic effects of penicillin, the basis for how the immune system recognises virus infected cells, and are used to develop and test almost every new drug. Laboratory mouse strains display many important disease phenotypes such as resistance to various forms of cancer (e.g. liver, lung, and skin cancer), bacterial, and viral infection and are used as models for many human diseases. The foundation for studying the genetic differences in these strains is having accurate genome sequences. Currently, only one mouse strain (C57BL/6J) has a genome sequence with sufficient accuracy to study genes involved in disease and immunity. In this project, we will upgrade the genome sequences of 16 other laboratory mouse strains to be reference quality using third generation sequencing technologies. This will form the basis for studying the genes that are known to be involved in the differential response to disease and immune challenges. This will enable researchers using these strains to combine sequence and phenotypic data we will determine whether sequence variants are likely to be contributing to disease susceptibility. All of the genome data generated by the project will be made freely available so that researchers worldwide can make maximum use of these genomes.
Technical Summary
Mouse models have long been used to gain a greater understanding of human disease genetics. In mouse genetics, decoding the genome of one strain (C57BL/6J) had a profound impact on our ability to relate sequence to function. There are more than two dozen different laboratory mouse strains used as models for human disease. In the Mouse Genomes Project, we have sequenced the genomes of 36 mouse strains to produce the first comprehensive public catalogs of genetic variation (single nucleotide, short insertion/deletions, and large multi-kilobase copy number changes or rearrangements), and studied its effect on phenotypes. This public resource has been invaluable to the mouse genetics community, and has been cited over 1,200 times since 2011. In 2016, we completed the first draft assembled genome sequences for these strains using second generation short-read sequencing technologies and long range libraries. We have identified over 5,144 loci in the mouse genomes where there is evidence that more complex strain specific alleles or haplotypes are present. These loci are enriched for genes involved in immunity, sensory, and kin recognition functions. These regions are enriched for highly identical repeat sequences which likely promotes recombination at these loci, and therefore pose a particular challenge for genome assembly. We propose to use long read technologies to create reference quality assemblies and genome annotation for 16 mouse strains, and develop and apply a set of quality control measures to assess the accuracy of the strain specific alleles. We will use the genome sequences to study the genome structure of these complex loci that have already been associated with disease phenotypes in QTLs from genetic reference panels such as the Collaborative Cross and Diversity Outbred Cross. All data from this project will be freely available via the Ensembl and UCSC genome browsers and the Mouse Genome Informatics database at the Jackson Laboratory.
Planned Impact
Laboratory mice have been key to many of the most important medical discoveries such antibiotic effects of penicillin, the basis for how the immune system recognises virus infected cells, and are used to develop and test almost every new drug. Laboratory mouse strains display many important disease response phenotypes such as resistance to various forms of cancer (e.g. liver, lung, and skin cancer), bacterial and viral infection, and are used as models to study many human diseases. The foundation for studying the genetic differences in these strains is having accurate genome sequences. Currently, only one mouse strain (C57BL/6J) has a genome sequence with sufficient accuracy to study genes involved in disease and immunity. This project will create high quality genome sequences for sixteen commonly used laboratory mouse strains. All of the raw and processed data will be deposited in appropriate public databases; European Nucleotide Archive - raw reads, European Variation Archive - genetic variation; Ensembl and UCSC genome browsers - visualisation of the assembled genomes and annotation. Therefore the impact from this project will be felt immediately by the wider mouse genetics community who are using these strains to study a wide range of human diseases. The genomes will provide the genetic basis for the pharmaceutical industry to understand differential responses to new drugs. Accurate genome sequences will reduce the numbers of mice currently required by mouse genetic laboratories.
People |
ORCID iD |
Thomas Keane (Principal Investigator) | |
David Adams (Co-Investigator) |
Publications
Fiddes I
(2018)
Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation
in Genome Research
Kolmogorov M
(2018)
Chromosome assembly of large and complex genomes using multiple references.
in Genome research
Lilue J
(2018)
Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.
in Nature genetics
Lilue J
(2019)
Mouse protein coding diversity: What's left to discover?
in PLoS genetics
Lindsay SJ
(2019)
Similarities and differences in patterns of germline mutation between mice and humans.
in Nature communications
Sisu C
(2020)
Transcriptional activity and strain-specific history of mouse pseudogenes.
in Nature communications
Thybert D
(2018)
Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes.
in Genome research
Title | Mouse genomes in ensembl |
Description | Genome sequences and genome annotation for sixteen laboratory mouse genomes for free use by the public. |
Type Of Material | Database/Collection of data |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | Availability for the wider mouse genetics community of whole genome sequences to reduce the number of laboratory animals required for experiments. |
URL | http://www.ensembl.org/Mus_musculus/Info/Strains?db=core |
Description | Ensembl Genome Browser |
Organisation | Ensembl |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We contributed the first draft genome sequences for 16 laboratory mouse genomes. |
Collaborator Contribution | The Ensembl group has made these genomes publicly available to the research community. |
Impact | Widespread availability of the genomes. |
Start Year | 2016 |
Description | Jackson Laboratory |
Organisation | The Jackson Laboratory |
Country | United States |
Sector | Charity/Non Profit |
PI Contribution | Genome sequencing, genome assembly, and gene prediction. |
Collaborator Contribution | Supply of key samples to complete the research. Collaboration on data analysis and interpretation. |
Impact | First draft genome sequences for laboratory mouse genomes, a key resource for all mouse genetics research. |
Start Year | 2008 |
Description | Lactation |
Organisation | Baylor College of Medicine |
Country | United States |
Sector | Hospitals |
PI Contribution | Bioinformatics and data science support. |
Collaborator Contribution | Partner has provided genome sequencing data for new strains of mouse, and collaboration on genome analysis and interpretation of the data. |
Impact | Poster presentation at international conference, International Mammalian Genome Society 2019. |
Start Year | 2017 |
Description | UCSC Genome Browser |
Organisation | University of California, Santa Cruz |
Country | United States |
Sector | Academic/University |
PI Contribution | Provided draft genome sequences and annotation. |
Collaborator Contribution | Hosting and public availability of the genome sequences to the wider scientific community. |
Impact | Genomes are available to the public. |
Start Year | 2016 |
Description | ANU Talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Invited talk entitled; Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci |
Year(s) Of Engagement Activity | 2019 |
Description | Harwell Talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | Invited talk entitled: Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci |
Year(s) Of Engagement Activity | 2019 |
Description | IMGC 2019 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Third sector organisations |
Results and Impact | Presentation at international conference on Mammalian Genetics. Expected outcome will be increased re-use and sustainability of the genome data generated by this award. |
Year(s) Of Engagement Activity | 2019 |
URL | https://imgc2019.sciencesconf.org/ |