Pig genome annotation and analysis

Lead Research Organisation: Wellcome Sanger Institute

Department Name: Bioinformatics Division

Abstract

We propose to provide state of the art analysis and annotation of the pig genome sequence being generated by the International Pig Genome Sequencing Project. We will make the annotated genome sequence accessible on the Web through the Ensembl site at http://www.ensembl.org . The pig genome is the entire DNA sequence of the pig which defines all the biological molecules that make up a pig. By acquiring, managing and annotating the pig genome sequence one accelerates research for both pig biology and for mammalian biology. Impact on pig biology: Because of the extensive selective breeding which has occurred during domestication, there are a considerable number of breed or line-specific features, from fat/muscle ratios, litter size to skin colour. These features can be mapped genetically into broad regions of the genome, but the final identification of the genes responsible and the causal genetic variation is very complex. The availability of a well-annotated pig genome sequence with links to other data sources, especially those on phenotypes such as growth, carcass composition or responses to infectious disease would provide a dramatic boost to the identification of these causative genes.

Technical Summary

The genome represents a complete description of an organism. However, to understand the functioning of the genes and regulatory elements, and to design sensible molecular biological experiments to test hypotheses, the genome sequence must be related to the extant functional data for that organism. We propose to annotate and analyse the sequence being generated by the International Pig Genome Sequencing Project. We will use the well established Ensembl system as the main tool for storage, management and dissemination of pig genome data. Pig genome sequencing is currently funded to 3-4x coverage from mapped clones, with two chromosomes at higher coverage. Experience from other low coverage genomes, such as cow, rabbit and armadillo is that this coverage will minimally provide an effective representation of exons, which can then be assembled into genes using a guide genome. By definition this approach cannot resolve lineage specific expansions in the pig genome. However, with this more clone based strategy there will be new opportunities for combining both assembly and annotation strategies to leverage more information out of a 3x assembly. We will integrate the pig genome sequence with diverse pre-existing data sets, including SNPs, ESTs and quantitative trait loci (QTL). We will integrate the sequence with maps (genetic, physical) and physical resources (clones, microarrays) providing a seamless route for interrogation and development of experimentation tools. Finally computational approaches, integrating the above resources and also leveraging the comparative genomics potential in the mammalian clade will be used to analyse and present the genome in a user friendly format. An annotated pig genome sequence will dramatically accelerate research on the pig as an important animal for agriculture and human biology. Our aim is to make the pig genome sequence maximally useful by delivering an annotated sequence of the highest quality in a user friendly manner.

Funded Value:

£473,268

Funded Period:

Jan 07 - Dec 09

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/E011640/1

Principal Investigator:

Tim Hubbard

Research Subject:

Animal science (12%)

Omic sciences & technologies (37%)

Tools, technologies & methods (26%)

Research Topic:

Animal organisms (12%)

Bioinformatics (26%)

Genomics (37%)

Organisations

Wellcome Sanger Institute (Lead Research Organisation)

People	ORCID iD
Tim Hubbard (Principal Investigator)
Jane Rogers (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Aken B (2016) The Ensembl gene annotation system in Database

Aken BL (2017) Ensembl 2017. in Nucleic acids research

Flicek P (2011) Ensembl 2011. in Nucleic acids research

Flicek P (2010) Ensembl's 10th year. in Nucleic acids research

Flicek P (2008) Ensembl 2008. in Nucleic acids research

Flicek P (2012) Ensembl 2012. in Nucleic acids research

Flicek P (2014) Ensembl 2014. in Nucleic acids research

Flicek P (2013) Ensembl 2013. in Nucleic acids research

Groenen MA (2012) Analyses of pig genomes provide insight into porcine demography and evolution. in Nature

Harrow JL (2014) The Vertebrate Genome Annotation browser 10 years on. in Nucleic acids research

Key Findings
Impact Summary
Research Databases and Models
Engagement Activities


Description	1. A clone path was generated by ordering the sequenced clones using the integrated physical map. The contigs within the clones where then ordered according to read-pair, end-sequence information and overlaps between neighbouring clones. This resulted in a highly refined genome sequence assembly that can further be improved by closing remaining gaps. The clone path generated during this grant is a public resource and was invaluable in the generation of a new pig assembly Sscrofa10.2. This genome paper was used as a basis for researching, the findings of are reported in the Nature paper and include: a. There is a deep phylogenetic split between European and Asian wild boars ~1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. b. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. c. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model. 2. The SScrofa9 assembly of the genome was annotated using Ensembl automatic gene prediction pipelines. A set of protein coding genes was predicted based on pig cDNA and EST evidence, and on alignments from other mammals. A set of non coding RNA genes has also been generated, predicted on the basis of alignments from RFAM and mirBASE. 3. Comparative genomics alignments including pig have been generated. These include pairwise alignments to human and cow, and multiple alignments to other mammals and vertebrates. Other comparative resources include gene trees showing relationships for pig genes with 48 other species.
Exploitation Route	The genome sequence, associated annotation and genome browser tools generated provides a resource that underpins Pig Genomics research. No genome sequence (not even human) is entirely complete, but the resources document how the sequence was generated and allow for it to be improved by additional sequencing. The most recent version of the genome is the Sscrofa10.2 assembly of the pig genome which was produced in August 2011 by the Swine Genome Sequencing Consortium (SGSC). This grant led to the successful application of a follow-on grant (BBSRC: Ensembl and enabling genetics and genomics research in farmed animal species BB/I025360) which supported Pig annotation being updated as reported by EBI (see report of outcomes). The most recent version of the annotation was released May 2012 with minor updated carried out in February 2014.
Sectors	Agriculture Food and Drink Education Environment
URL	http://www.ensembl.org/Sus_scrofa/Info/Index


Description	The genome sequence and associated annotation are made accessible through the Ensembl genome browser. The browser is widely used by pig researchers to integrate data they have independently collected, design specific experiments etc. Handling large genomes, generating annotation and providing tools to use this data requires substantial IT and software infrastructure. By generating sequence centrally though the Sanger Institutes' sequencing facilities and annotation and bioinformatics services through the Sanger/EBI Ensembl project software the productivity of Pig researchers is greatly increased since they can share data using a common platform and avoid each investing substantially in duplicate bioinformatics analysis.
First Year Of Impact	2006
Sector	Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types	Economic


Title	Addition of Ensembl-Havana gene set to Pig (Ensembl 69)
Description	An Ensembl-Havana gene set was added to the annotation. The VEGA manual annotation which had been generated through a community effort was added.
Type Of Material	Database/Collection of data
Year Produced	2012
Provided To Others?	Yes
Impact	The annotated reference genome sequences have been delivered through a series of Ensembl releases
URL	http://oct2012.archive.ensembl.org/Sus_scrofa/Info/Index


Title	Ensembl release 74
Description	orthologues to new human and mouse genes
Type Of Material	Database/Collection of data
Year Produced	2013
Provided To Others?	Yes
Impact	secondary structure of non-coding RNAs are now shown on the gene summary page, using the R2R package.
URL	http://dec2013.archive.ensembl.org/index.html


Title	Updated Pig Ensembl Website (Ensembl 77)
Description	Secondary structure of non-coding RNAs are now shown on the gene summary page, using the R2R package
Type Of Material	Database/Collection of data
Year Produced	2013
Provided To Others?	Yes
Impact	The annotated reference genome sequences have been delivered through a series of Ensembl releases.
URL	http://www.ensembl.org/Sus_scrofa/Info/WhatsNew?db=core


Description	Ensembl Genebuild Workshop by invitation from Yiqiang Zhao of CAU
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Undergraduate students
Results and Impact	- introduction to ensembl and the human genome project - introduction to gene building - outreach resources (YouKu etc) - workshop on running ensembl gene annotation pipeline - workshop on running ensembl RNA-seq pipeline na
Year(s) Of Engagement Activity	2014
URL	http://www.ebi.ac.uk/training/workshop/ensembl-genebuild-workshop

Abstract

Technical Summary

Organisations

People

ORCID iD

Publications