A long term resource to maximise the potential of laboratory mouse strains for medical research

Lead Research Organisation: Wellcome Sanger Institute

Department Name: Computational Genomics

Abstract

Our key aim is to explore the relationship between genetic and medically relevant human disease phenotypes. One way to do this is to assess the genetic differences between long-established laboratory mouse strains. Laboratory mouse strains display many important disease phenotypes such as resistance to various forms of cancer (e.g. liver, lung, and skin cancer), bacterial, and viral infection and are used as models for many human diseases. The foundation for studying the genetic differences in these strains is having accurate genome sequences. In this project, we will first generate genome sequences for the most commonly used laboratory mouse strains and then use these sequences and knowledge of the gene structures to determine the genetic cause of observed disease response and behaviour differences between these strains. By combining sequence and phenotypic data we will determine whether sequence variants are likely to be contributing to disease susceptibility.

The main aim of this project is to correctly identify all the genes on the newly completed release of genome sequences of 12 laboratory mouse strains. This is achieved in a combination of two strategies. Initially the genes will be identified using state of the art bioinformatic programs and pipelines. The genes are identified by matches to known mouse proteins on the genome, other transcribed data such at mRNAs and ESTs or conserved proteins from other species. As this is an automatic pipeline, there will be complex gene families that cannot be correctly identified and require manual inspection. The HAVANA team have been involved in manual annotation of the human, mouse and zebrafish reference genomes and have developed in-house specialist tools to help accurate identification of genes within different genomes. Since manual inspection is expensive and time consuming the manual effort will be targeted on complex gene families and genes of specific interest to the mouse scientific research community. Engaging with the community will be essential to receive feedback about targeting of annotation as well as to generate community participation in the manual inspection of genes of interest. Automatic annotation identifies around 70% of genes correctly, therefore the aim would be to use bioinformatics analysis and feedback from researchers to target the 30% incorrectly annotated genes and improve them.

Technical Summary

The genome represents a complete description of an organism. However, to understand the functioning of the genes and regulatory elements, and to design molecular biological experiments to test hypotheses, the genome sequence must be related to the extant functional data for that organism. In particular the set of genes must be accurately annotated. The first chromosome sequences for the laboratory mouse strains are soon to be released by the Mouse Genomes Project at the Wellcome Trust Sanger Institute. The main aim of this proposal is to take the sequences and create strain-specific annotation and targeted manual annotation in regions where the automated processes fail.

We propose to create a comprehensive evidence-based set of gene annotations for twelve laboratory mouse strains. This will be a combination of manual annotation in targeted loci and genome wide automatic annotation. Manual annotation provides the most accurate annotation of a locus, with all transcripts for which there is evidence, generated. Automatic annotation provides rapid genome wide gene annotation. Together, they provide the most useful cost effective gene set for researchers.

Manual annotation will be targeted at loci chosen by the community as important for medical based research, or where user feedback suggests automatic annotation has failed to generate good models. It will be performed using the established Otterlace/ZMap annotation tools.

An established process, used successfully in the ENCODE project, will merge the manual and automatic annotation for each Ensembl release. The gene set will be made available through the Ensembl website and via the other access methods to Ensembl (biomart datamining interface, Perl API, flat file dumps, MySQL database) and MGI, and for Ensembl tools e.g. Variant Effect Predictor. The gene set will be further annotated each release by Ensembl's comparative genomic, variation and functional genomic pipelines.

Planned Impact

The most obvious beneficiary of these genome sequences and annotation generated will be the mouse genetics community involved in mapping complex disease related traits, researchers mapping mutations in crosses involving the wild-derived strains and crosses attempting to identify modifiers of mutations.

Complete genome sequence and annotation is needed to explore the relationship between genetic and phenotypic variation at a number of levels. First, it is a starting point for exploring how sequence and gene structure variation impinges on gene function. The new gene structures that this project will identify will provide a resource for examining sequence function, particularly in those regions, identified by the ENCODE project, that are either transcribed or implicated in gene regulation. Importantly, complete sequence will allow unambiguous assignment of function to specific nucleotide differences.

Second, the sequence will accelerate the identification of genes involved in the increasingly large number of phenotypes available for inbred strains. To date, more than 2,000 loci that contribute to quantitative variation have been identified, with only a small number characterized at a molecular level. The de novo assemblies and corresponding annotation data will obviate the need to re-sequence candidate genes identified in genetic analysis of complex traits.

Third, in combination with accumulating expression, proteomic and metabolomic data sets, accurate genome annotation of multiple mouse strains will markedly improve our ability to understand gene function. A systems biology approach will be possible, in which the integration of genetic and functional genomic data provides a path to inferring causal associations between genes and disease.

Funded Value:

£674,758

Funded Period:

Mar 15 - Feb 18

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/M000281/1

Principal Investigator:

David Adams

Research Subject:

Genetics & development (40%)

Omic sciences & technologies (30%)

Tools, technologies & methods (20%)

Research Topic:

Bioinformatics (20%)

Functional genomics (15%)

Gene action & regulation (15%)

Genome organisation (25%)

Genomics (15%)

Organisations

People	ORCID iD
David Adams (Principal Investigator)
Jennifer Harrow (Co-Investigator)
Thomas Keane (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Adams DJ (2015) The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes. in Mammalian genome : official journal of the International Mammalian Genome Society

Balmus G (2019) ATM orchestrates the DNA-damage response to counter toxic non-homologous end-joining at broken replication forks. in Nature communications

Doran A (2016) Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation, and homozygous truncating mutations

Doran A (2016) Additional file 2: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations

Doran A (2016) Additional file 1: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations

Doran AG (2016) Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations. in Genome biology

Dykes IM (2018) A Requirement for Zic2 in the Regulation of Nodal Expression Underlies the Establishment of Left-Sided Identity. in Scientific reports

Fiddes IT (2018) Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. in Genome research

Further Funding
Research Databases and Models
Collaboration
Engagement Activities


Description	Maximising the potential of laboratory mice for understanding the genetic basis for disease
Amount	£685,000 (GBP)
Funding ID	MR/R017565/1
Organisation	Medical Research Council (MRC)
Sector	Public
Country	United Kingdom
Start	03/2018
End	04/2021


Title	Additional file 2 of Identification and characterisation of spontaneous mutations causing deafness from a targeted knockout programme
Description	Additional file 2.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Identification_and_charact...


Title	Additional file 2 of Identification and characterisation of spontaneous mutations causing deafness from a targeted knockout programme
Description	Additional file 2.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Identification_and_charact...


Title	Additional file 3 of Identification and characterisation of spontaneous mutations causing deafness from a targeted knockout programme
Description	Additional file 3.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Identification_and_charact...


Title	Additional file 3 of Identification and characterisation of spontaneous mutations causing deafness from a targeted knockout programme
Description	Additional file 3.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Identification_and_charact...


Title	Additional file 3: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Pairwise variant comparisons. The number of SNPs and indels shared between any two strains contained in the MGP variation catalogue. (XLSX 24 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Deep_genome_sequencing_and...


Title	Additional file 3: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Pairwise variant comparisons. The number of SNPs and indels shared between any two strains contained in the MGP variation catalogue. (XLSX 24 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Deep_genome_sequencing_and...


Title	Additional file 4: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Per strain genotype concordance with HapMap. Total HapMap positions available for comparison with each strain and the corresponding number of concordant sites. (XLSX 4 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Deep_genome_sequencing_and...


Title	Additional file 4: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Per strain genotype concordance with HapMap. Total HapMap positions available for comparison with each strain and the corresponding number of concordant sites. (XLSX 4 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Deep_genome_sequencing_and...


Title	Additional file 5: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	VEP, GMS and SIFT annotations for SNPs and indels in each of the 13 strains. Total number of SNPs and indels annotated into each functional consequence predicted by VEP. Total number of SNPs with an estimated moderate (using GMS), radical (GMS), tolerated (using SIFT) and deleterious (SIFT) effect is also provided. (XLSX 8 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_5_of_Deep_genome_sequencing_and...


Title	Additional file 5: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	VEP, GMS and SIFT annotations for SNPs and indels in each of the 13 strains. Total number of SNPs and indels annotated into each functional consequence predicted by VEP. Total number of SNPs with an estimated moderate (using GMS), radical (GMS), tolerated (using SIFT) and deleterious (SIFT) effect is also provided. (XLSX 8 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_5_of_Deep_genome_sequencing_and...


Title	Additional file 6: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Private SNPs with multiple VEP consequences. Private SNPs with multiple predicted VEP consequences for each strain strain. Consequence predictions are based on gene models obtained from Ensembl 78. (TXT 66669 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_6_of_Deep_genome_sequencing_and...


Title	Additional file 6: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Private SNPs with multiple VEP consequences. Private SNPs with multiple predicted VEP consequences for each strain strain. Consequence predictions are based on gene models obtained from Ensembl 78. (TXT 66669 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_6_of_Deep_genome_sequencing_and...


Title	Additional file 7: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Candidate genes identified in the RF/J analysis. All genes which contained at least one missense SNP only found in the RF/J strain. (XLSX 8 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_7_of_Deep_genome_sequencing_and...


Title	Additional file 7: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Candidate genes identified in the RF/J analysis. All genes which contained at least one missense SNP only found in the RF/J strain. (XLSX 8 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_7_of_Deep_genome_sequencing_and...


Title	Additional file 8: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Per strain over-represented pathways using genes containing private missense SNPs. For each strain separately significantly over-represented pathways were identified using only the genes which contained at least one private missense variant. (XLSX 56 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_8_of_Deep_genome_sequencing_and...


Title	Additional file 8: of Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations
Description	Per strain over-represented pathways using genes containing private missense SNPs. For each strain separately significantly over-represented pathways were identified using only the genes which contained at least one private missense variant. (XLSX 56 kb)
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
URL	https://springernature.figshare.com/articles/dataset/Additional_file_8_of_Deep_genome_sequencing_and...


Title	Mouse Genomes Project database
Description	A catalogue of mouse strain sequences
Type Of Material	Database/Collection of data
Year Produced	2010
Provided To Others?	Yes
Impact	Thousands of users and hundreds of papers
URL	http://www.sanger.ac.uk/resources/mouse/genomes/


Title	Mouse genomes in ensembl
Description	Genome sequences and genome annotation for sixteen laboratory mouse genomes for free use by the public.
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
Impact	Availability for the wider mouse genetics community of whole genome sequences to reduce the number of laboratory animals required for experiments.
URL	http://www.ensembl.org/Mus_musculus/Info/Strains?db=core


Title	UCSC Genome Browser
Description	Mouse genomes in UCSC Genome Browser
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	Public availability of first draft genome sequences for 18 laboratory mouse strains
URL	https://genome.ucsc.edu/


Description	Ensembl Genome Browser
Organisation	EMBL European Bioinformatics Institute (EMBL - EBI)
Country	United Kingdom
Sector	Academic/University
PI Contribution	In this project, we produced genome assemblies for the most widely used laboratory mouse strains. These genomes are now available to the wider community via the Ensembl Genome Browser.
Collaborator Contribution	Ensembl has provided services for hosting and presenting the genome sequences to the wider research community. This provides ongoing long term sustainability for the data.
Impact	Increased usage of the data Long term sustainability and availability of the data
Start Year	2018


Description	Jackson Laboratory
Organisation	The Jackson Laboratory
Country	United States
Sector	Charity/Non Profit
PI Contribution	Genome sequencing, genome assembly, and gene prediction.
Collaborator Contribution	Supply of key samples to complete the research. Collaboration on data analysis and interpretation.
Impact	First draft genome sequences for laboratory mouse genomes, a key resource for all mouse genetics research.
Start Year	2014


Description	UCSC genome annotation
Organisation	University of California, Santa Cruz
Country	United States
Sector	Academic/University
PI Contribution	Genome sequencing, and genome assemblies for sixteen inbred laboratory mouse genomes.
Collaborator Contribution	Personnel and IT resources to complete whole-genome annotation of sixteen inbred laboratory mouse genomes.
Impact	Whole-genome annotation of sixteen inbred laboratory mouse genomes.
Start Year	2015


Description	Conference of the International Mammalian Genome Society 2015
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Third sector organisations
Results and Impact	A research talk "Multiple mouse reference genomes and strain specific gene" describing the public resource being generated through this work.
Year(s) Of Engagement Activity	2015
URL	http://www.imgc2015.jp/


Description	Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation, and homozygous truncating mutations
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	A talk by Dr. Anthony Doran at The Allied Genetics Conference 2016.
Year(s) Of Engagement Activity	2016
URL	http://www.genetics2016.org


Description	Discovery, assembly, and annotation of subspecies specific haplotypes in classical and wild-derived mouse strains
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Third sector organisations
Results and Impact	A talk at The Allied Genetics Conference 2016
Year(s) Of Engagement Activity	2016
URL	http://www.genetics2016.org


Description	Multiple mouse reference genomes defines subspecies specific haplotypes and novel coding sequences
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talk at international research conference
Year(s) Of Engagement Activity	2017
URL	http://imgs.org/


Description	Talk: Discovery, assembly, and annotation of subspecies specific haplotypes in classical and wild-derived mouse strains
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	A talk at The Allied Genetics Conference 2016 by Thomas Keane
Year(s) Of Engagement Activity	2016
URL	http://www.genetics2016.org/