The Animal Functional Genomics Resource

Lead Research Organisation: European Bioinformatics Institute
Department Name: Vertebrate Genomics

Abstract

Research on domesticated animals has important socio-economic impacts, including underpinning and accelerating improvements in the animal sector of agriculture (animal breeding and animal health), contributing to human and veterinary medicine by providing animal models, and improving animal health and welfare. The chicken also serves as a model for all other avian species, so is important in the fields of embryology and development, neurobiology and behaviour, and the ecology and evolution of natural populations.

The genome is the entire DNA content of an organism. For the genome sequence to be useful, the sequence needs to be annotated with the location of genes and their regulatory elements along the DNA sequence. Information on the location of at least coding genes (that is, genes which make proteins) is now available for many economically important farmed and companion animals, and efforts are underway in many parts of the world to increase our knowledge of non-coding regions (i.e. non-coding RNAs and regulatory elements). In addition, the advent of new DNA sequencing technologies now mean that it is possible to sequence many individuals and compare their differences in DNA sequence and gene content and relate this to their physiological differences. It is also possible to generate 'functional' data by sequencing ("assay by sequencing"). Functional sequence data tells us about which genes are active, which genes code for proteins and which genes code for regulatory RNAs. It can also tell us about other features within the DNA sequence that are responsible for genes being switched on or off, for example, in specific tissues or in response to signaling molecules. Functional sequence data is therefore very important in informing us about how differences in the DNA sequence in individuals can affect gene activity and are therefore likely to affect phenotypes, such as production or disease resistance traits. Some DNA databases contain functional data for farmed and companion animals often in its raw form (e.g. sequence reads), however this data is most useful when it has been checked for quality and processed further, for example assigning it to specific genes and transcripts. It would also be preferable if more data were submitted to these databases from the research community around the world.

Our proposed research aims to look at the data that is available for these animals in the public DNA databases, and check it for quality. We will also work to ensure that future datasets that are submitted to the databases have as much useful information associated with them as possible, for example, breed, sex and tissue type, etc. We will also define quality standards for such data and improve data discoverability by drawing together datasets from disparate projects into a cohesive collection that is accessible both programmatically and via a website.

Technical Summary

The major goal of the proposed Animal Functional Genomics Resource (AFGR) is to maximize the usefulness of publicly archived functional genomic data for farmed and companion animal species in DNA databases.

The AFGR will index all the animal functional data available in the EMBL-EBI's European Nucleotide Archive (ENA). Experiments that meet our metadata standards will be processed through our standard analysis pipelines for RNA-Seq, ChIP-Seq and Methylation analysis. AFGR will use these pipelines to analyse and perform quality control (QC) on the data. Results that pass our QC metrics will then be distributed to the community and where appropriate passed to Ensembl to improve the gene and regulatory annotation of genomes.

We are involved in the FAANG metadata and data sharing committee (MDS), which is developing standards for sample, experimental and analysis metadata inside GitHub (records versions), to ensure the minimal metadata needed for data analysis are recorded in a well-structured manner. These standards use ontologies such as Uberon, the Cell Type Ontology and the Animal Trait Ontology for Livestock to provide specific descriptions for different sample and experimental attributes.

To ensure that the data being collected are useful to downstream analysis, we will establish stringent quality metrics to filter out anomalous datasets. These quality metrics will be generated as part of our standard analysis pipelines and presented to the community through both the FTP site and data portal, so that users can browse and filter data based on different QC criteria.

All the metadata stored will be fully indexed to allow for complete searching. We will also build views on the data, ensuring that users can easily browse the raw data and analysis files. This browser will also have a RESTful API, allowing programmatic access to the same metadata, enabling bulk queries and for other groups to build services on top of our data.

Planned Impact

WHO WILL BENEFIT FROM THIS RESEARCH?
The immediate and direct beneficiaries will include scientists engaged in the international collaborative efforts to characterise the genomes of domesticated animals, in particular members of the Functional Annotation of Animal Genomes (FAANG), Genome10K and numerous target species (chicken, pig, sheep, cattle, etc.) genome consortia. The integrated, functional genome resources will also be useful to a wide range of scientists engaged in research on domesticated animals in agricultural, biomedical or animal health and welfare contexts. The resources will also benefit scientists engaged in characterising the human and other genomes by providing access to high quality functional genomics datasets for a number of vertebrates and associated software tools. The animal breeding and animal health sectors will also benefit from the project outputs through increased knowledge and access to new tools and resources. More generally, the research will benefit scientists concerned with understanding the regulation of gene expression and the genetic determinants of phenotypes.

These data will also be of value to other international consortia including the ENCODE, Epigenome Roadmap, FANTOM, International Human Epigenome and Blueprint Consortia for comparative analyses.

HOW WILL THEY BENEFIT FROM THIS RESEARCH?
The project will benefit the international FAANG and numerous target species (chicken, pig, sheep, cattle, etc.) genome consortia by developing a set of computational tools and resources to access functional genomics datasets. In particular, the project / resources will meet the FAANG project's need for a Data Coordination Centre. These new tools and resources will facilitate the detailed annotation of these genomes, such as defining transcripts, genes and regulatory regions. These annotations in turn will allow researchers and industry to annotate genome variation within specific populations of targets species to uncover causal relationships from sequence to molecular phenotype to macro-phenotype.

Computational solutions and resources will provide (i) access to available functional genomics datasets generated on farmed and companion animal species; (ii) together with high quality metadata facilitating further analysis of these data; (iii) quality metrics will be made available for these data to enable and encourage researchers to reuse these datasets to answer new biological questions; and (iv) finally, diverse functional genomics datasets will be assembled into a single resource for either direct download (raw data, metadata and QC metrics) from dedicated ftp sites or filtered from using a data portal to search for specific subsets of data.

This project will also provide training within and between collaborating labs to improve skills and foster new ways of working between biologists and computational scientists. Training and outreach will be core activities within the proposed project thus facilitating the delivery of impact. We will develop training materials and organise training workshops for postgraduate students, early stage and experienced researchers. Our outreach activities will encompass not only scientific conferences, but also events targeted at industry, e.g. through the Knowledge Transfer Network and directly with our industrial collaborators.

Publications

10 25 50
 
Description The Animal Epigenome database requires well-structured data attached to biological samples ('metadata') to ensure that the collected sample and experimental information is meaningful in downstream analyses, data produced by different groups is comparable, and that data users can identify the datasets most effective for their research questions. To achieve this requires FAANG consortium members to follow the defined FAANG metadata standards (https://github.com/faang/faang-metadata) when submitting samples, experiments and analyses. To assist and ensure this, we provide templates to collect the data and a recently updated validation tool to confirm a submitter has met the required standards and assist them in fixing and improving their metadata (https://data.faang.org/validation/samples). This tool has been completely redeveloped to better serve the community and now includes submission brokering to INSDC archives to further simplify the submission process. We have also defined a legacy metadata standard to allow the Animal Epigenome database to import existing data and new data generated outside the FAANG consortium into our databases. This legacy standard requires less comprehensive metadata, but does include a set of minimum fields required to use the data in Ensembl's gene annotation methods. This has enabled us to import existing data and expose it to our community. Legacy data is clearly labelled in the FAANG portal and API, to make it easy for users to select or ignore this data as required for their analyses. We are actively supporting FAANG sample and experiment submitters providing sample and sequence data for the FAANG initiative. We provide online documentation (https://dcc-documentation.readthedocs.io/en/latest/) and an email helpdesk faang-dcc@ebi.ac.uk. We have delivered a number of Metadata and Data sharing workshops at major farm animal conferences, and virtually during the COVID-19 pandemic, to provide training for FAANG community members in producing high quality metadata and submitting to public archives through the new FAANG brokered submission system. Each of these has been followed by an increase in high quality data submissions. Summary statistics are provided for FAANG datasets as a whole (https://data.faang.org/summary/organisms). With the FAANG data portal representing a collection of funded data producing projects we have provided project specific data slices to collate the data specific to each project (https://data.faang.org/projects). Our newly developed publication browser makes it easy to quickly download all data associated with a particular FAANG publication (https://data.faang.org/article). We are actively engaged with different FAANG groups from around the world, attend major farmed animal conferences and communicate through FAANG working groups and steering committee.
Exploitation Route Our tools can be used by the community to submit useful, annotated data to the worldwide archives and thus ensure that it is available for them and the research community to use. The data portal provides a high quality collection of data for use in genotype to phenotype research, and in particular is of use for larger comparative studies such as Genotype-Tissue Expression Studies.
Sectors Agriculture, Food and Drink

URL https://data.faang.org/
 
Description The EuroFAANG H2020 funded FAANG projects (AQUA-FAANG, BovReg and GENE-SWitCH) that started in 2019 all include industrial partners, such as the multi species breeding company Hendrix genetics. The FAANG Data Portal infrastructure is thus now being employed in research of direct economic improvement for agriculture in Europe as the genotype to phenotype research sets have industrial applications in disease resistance and feed efficiency.
First Year Of Impact 2019
Sector Agriculture, Food and Drink
Impact Types Economic

 
Description Ensembl - adding value to animal genomes through high quality annotation
Amount £378,425 (GBP)
Funding ID BB/S020152/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2019 
End 07/2022
 
Description Ensembl in a new era - deep genome annotation of domesticated animal species and breeds
Amount £419,170 (GBP)
Funding ID BB/W019108/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2022 
End 10/2025
 
Description H2020-SFS-2018-2
Amount € 5,994,309 (EUR)
Funding ID 815668 (BovReg) 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 05/2019 
End 04/2023
 
Description H2020-SFS-2018-2
Amount € 5,999,886 (EUR)
Funding ID 817998 (GENESWITCH) 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 05/2019 
End 04/2023
 
Description H2020-SFS-2018-2
Amount € 6,000,000 (EUR)
Funding ID 817923 (AQUA-FAANG) 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 05/2019 
End 04/2023
 
Title Animal Epigenomes data portal website 
Description Enabling the community to discover the samples and data we have indexed is vital to the success for the Animal Epigenome Resource project. In January we launched the data portal website, which allows users to explore the samples submitted for cow, sheep, goat, chicken, pig and horse. As more data is added to the Animal Epigenome database, the functionality of this website will be extended to allow users to explore the experiment and analysis data collected and view the data in Ensembl via TrackHub technology. 
Type Of Material Data handling & control 
Year Produced 2017 
Provided To Others? Yes  
Impact Focal point for FAANG research, containing all FAANG generated datasets, freely available for search and download by the community. 
URL http://data.faang.org
 
Title The Animal Epigenome database 
Description The Animal Epigenome database imports samples and will import experiments and analyses from the EMBL-EBI data archives to provide query optimised representation of this data. This database currently includes all the samples submitted for the FAANG project and we are working to extend this to both add legacy sample records (the experimental data which has previously been submitted to the archive) as well as that which is now starting to be submitted by FAANG consortium members. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact This database will enable the community to discover what samples and experimental data exists; it will also support the FAANG data portal. 
 
Description EMBL-EBI collaboration with the Functional Analysis of ANimal Genomes (FAANG) Consortium 
Organisation Functional Annotation of ANimal Genomes (FAANG)
Country Global 
Sector Charity/Non Profit 
PI Contribution We participate in conference calls on the analysis of data. Peter Harrison co-chairs the Metadata and Data Sharing Committee and is a member of the FAANG steering committee. The objective of the Metadata and Data Sharing committee is to recommend standard methods to record information for all samples, experiments and analyses carried out by FAANG consortium members; recommend best practice for data archiving; and define data sharing methodologies that encourage sharing within the FAANG consortium and rapid public release of raw data and analysis results.
Collaborator Contribution There are numerous partners that are part of this collaboration. They contribute data, tools, and other expertise.
Impact The collaboration is still in the early stages, and we are aiming to get funding for this work. FAANG aims to: Standardize core assays and experimental protocols Coordinate and facilitate data sharing Establish an infrastructure for analysis of these data Provide high quality functional annotation of animal genomes
Start Year 2014
 
Description FAANG COST Chip-Seq Training School: What to do with your data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact FAANG summer school for PhD and postdocs on "ChIP-seq (wet lab) and basic functional animal genome analysis", which will be held from June 25-29, 2018 in Wageningen. Peter Harrison gave a talk on What to do with you data discussing open science, FAIR data standards, FAANG data access policy and FAANG metadata standards. Peter Harrison then led a workshop on how to submit FAANG data to the public archives.
Year(s) Of Engagement Activity 2018,2019
 
Description FAANG COST Chip-Seq Training School: What to do with your data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact FAANG summer school for PhD and postdocs on "ChIP-seq (wet lab) and basic functional animal genome analysis", which will be held from June 25-29, 2018 in Wageningen. Peter Harrison gave a talk on What to do with you data discussing open science, FAIR data standards, FAANG data access policy and FAANG metadata standards. Peter Harrison then led a workshop on how to submit FAANG data to the public archives.
Year(s) Of Engagement Activity 2018
 
Description FAANG Data Coordination Centre: Submitting and Retrieving Rich Datasets 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Peter Harrison attended PAG 2019. Attendees at the conference range include both academics and those working in industry. Peter Harrison presented a talk on the project, attended FAANG strategic meetings relating to training and analysis platforms, since we are collaborators with the FAANG consortium. Peter Harrison was part of the panel discussion at the end of the FAANG workshop.
Year(s) Of Engagement Activity 2019
URL https://plan.core-apps.com/pag_2019/event/78c676db36256e3b99278c706b830ccd
 
Description FAANG Shared Workshop: Foundations for Future 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop including sessions on data portal, metadata improvement and incorporation of analysis pipelines into FAANG DCC.
Year(s) Of Engagement Activity 2020
 
Description FAANG bioinformatics training workshop & FAANG DCC talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Peter Harrison attended PAG 2020. Attendees at the conference range include both academics and those working in industry. Peter Harrison gave a bioinformatics training workshop & two talks on the FAANG DCC, and one on the vision for the future of Bioinformatics in FAANG. He also was part of a panel discussion on the future of FAANG,
Year(s) Of Engagement Activity 2020
 
Description FAANG: Establishing Metadata Standards, Validation and the FAANG Data Portal 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Peter Harrison attended PAG 2018. Attendees at the conference range include both academics and those working in industry. Peter presented a talk on the project, attended FAANG strategic meetings during the conference, and was on a FAANG discussion panel.
Year(s) Of Engagement Activity 2018
 
Description FAANG: Hands on metadata validation and data submission training 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This event was a four hour training session in the use of the tools developed for validation of metadata and data submission to the BioSamples (https://www.ebi.ac.uk/biosamples/), FAANG Data Portal (http://data.faang.org/home) and ENA (https://www.ebi.ac.uk/ena).
Year(s) Of Engagement Activity 2019
URL https://plan.core-apps.com/pag_2019/event/c3eb8177e7ac2f211aa9202c49285cf6
 
Description Poster and attendance at FAANG workshop at PAG 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Peter Harrison and Laura Clarke presented a poster on the aims of the project and the work of the FAANG consortium, in order to raise awareness of the project. They also attended a workshop run by the FAANG consortium; the aim was to discuss advances and opportunities to bring in data for FAANG.
Year(s) Of Engagement Activity 2017
URL http://www.intlpag.org/2017/program/program-overview-xxv
 
Description Presentation on data submission to BioSamples, FAANG Data Portal and ENA during the FAANG workshop during the 7th International Symposium on Animal Functional Genomics (ISAFG) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The purpose of the FAANG workshop during the 7th International Symposium on Animal Functional Genomics (ISAFG) and in particular the presentation on data submission was i) to train researchers in the systems for submitting data to support functional annotation and the aims of the Functional Annotation of Animal Genomes (FAANG) initiative; ii) to encourage participation in FAANG and iii) to encourage timely submission of data to the public data repositories and the FAANG Data Portal.
Year(s) Of Engagement Activity 2018
URL http://www.isafg2018.com/program.html
 
Description Salmonid community communication 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact We have been working with the Salmonid community to raise awareness of FAANG metadata and data sharing best practise and help them establish similar standards themselves. This should ensure any data generated by this community is interoperable with FAANG created data and can be accessed through the same routes
Year(s) Of Engagement Activity 2016
 
Description Sharing Perspectives Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Peter Harrison was an invited attendee at GenRes Bridge Conference, Tuusula, Finland. This meeting was to build bridges between different genetic resources communities by bringing together a range of experts, stakeholders and end-users in the field of management of crop, forest and animal genetic resources. Also contributed to satellite meeting on Workshop on 'Enhancing and Linking Information Systems'.
The most important impact of this activity was the contribution to the generation of an integrated strategy for genetic resources.
Year(s) Of Engagement Activity 2019
 
Description Talk, Project Management/ Coordination 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact Peter Harrison attended the FAANG COST Management and Final conference held in Prague, Czech Republic. He gave a talk on Project Management and Coordination of the FAANG Data Coordination Centre
Year(s) Of Engagement Activity 2020
URL http://www.genresbridge.eu/about-us/events/event/sharing-perspectives-workshop/
 
Description The FAANG Data Coordination Centre: How to find and provide FAANG Data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Peter Harrison attended International Plant & Animal Genome XXVI and gave a presentation titled 'The FAANG Data Coordination Centre: How to find and provide FAANG Data'. The conference attendees range from academics to members of industry and policymakers.
Year(s) Of Engagement Activity 2018
URL https://www.ebi.ac.uk/about/events/2018/pag-xxvi-plant-animal-genome-conference-2018
 
Description The Functional Annotation of Animal Genomes (FAANG) Project: Metadata, data sharing and the FAANG data portal 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Peter Harrison and Laura Clarke attended ISAG 2017. Attendees at the conference range include both academics and those working in industry. Laura presented a talk on the project, and both people attended FAANG strategic meetings during the conference, since we are collaborators with the FAANG consortium.
Year(s) Of Engagement Activity 2017
 
Description The Functional Annotation of Animal Genomes Project (FAANG): Data validation, submission and retrieval 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact EMBL-EBI hosted the Livestock Genomics Conference at Hinxton, UK. Peter Harrison and Paul Flicek hosted the event. Peter Harrison and Paul Flicek (keynote slot) both gave talks at the event. FAANG featured strongly in the event that included two days of talks and a poster session.
Year(s) Of Engagement Activity 2018
 
Description Virtual training session 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Training session on GENE-SWitCH terrestrial livestock metadata and FAANG data submission
Year(s) Of Engagement Activity 2020
 
Description Virtual training session 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Training session on AQUA-FAANG aquaculture focussed metadata and FAANG data submission.
Year(s) Of Engagement Activity 2020