Co-ordination, integration and distribution of sequence and structural family data

Lead Research Organisation: University College London
Department Name: Unlisted

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Publications

10 25 50

publication icon
Andreeva A (2008) Data growth and its impact on the SCOP database: new developments. in Nucleic acids research

publication icon
Bateman A (2002) The Pfam protein families database. in Nucleic acids research

publication icon
Bateman A (2004) The Pfam protein families database. in Nucleic acids research

publication icon
Finn R (2003) Identifying protein domains with the Pfam database. in Current protocols in bioinformatics

publication icon
Finn RD (2008) The Pfam protein families database. in Nucleic acids research

publication icon
Finn RD (2006) Pfam: clans, web tools and services. in Nucleic acids research

publication icon
Finn RD (2007) ProServer: a simple, extensible Perl DAS server. in Bioinformatics (Oxford, England)

 
Title Pfam data 
Description A variety of newly defined and curated data have been added to the member databases. Data are available from the following websites: InterPro: http://www.ebi.ac.uk/interpro/ MSD: http://www.ebi.ac.uk/msd/ SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/ CATH: http://www.cathdb.info/ Pfam: http://www.sanger.ac.uk/Software/Pfam/ DAS registry: http://www.dasregistry.org/ Sisyphus: http://sisyphus.mrc-cpe.cam.ac.uk/sisyphus/ SIFTS: http://www.ebi.ac.uk/msd-srv/docs/sifts/ The data are distributed under a variety of licensing arrangements depending on each partner database. However, these are all freely available. The partners have produced an XML schema for exchanging protein family data that is available from: http://www.efamily.org.uk/xml/efamily/documentation/efamily.shtml 
Type Of Material Database/Collection of Data/Biological Samples 
Year Produced 2006 
Provided To Others? Yes  
Impact The DAS exchange protocol has been extended to include information on protein alignments and protein interactions (See Finn et al. 2007 Bioinformatics). 
URL http://www.ebi.ac.uk/interpro/
 
Description Co-applicants on G0100305 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution The work was designed to add value to the core functions of the nucleic acid sequence, protein sequence, protein motif and structure databases. By building on these resources and integrating data from other sources, this proposal will also provide an essential portal for the exploitation of genomic and proteomic data by researchers in the UK and world-wide, in both the academic and industrial domains.
Collaborator Contribution Co-applicant on this workCo-applicant on projectCo-applicant on this work
Impact The achievements of the eFamily project have arisen from basic research. The major achievements have been to build important new data sources, write new software, and make existing data available in new ways to the scientific community. Notable achievements are listed below: 1) Setting up of the DAS registry (http://www.dasregistry.org/) - 267 servers are registered from 41 institutions in 16 countries. 2) We have made data available from each database via DAS. 3) The iPfam database of protein-protein interactions of known structure was set up. 4) The Sisyphus database of structural alignments of non-trivial relationships provides a high quality resource of alignments of difficult to align families. 5) The eFamily project has contributed to the DAS specifications (http://www.biodas.org/wiki/Main_Page) for protein interaction and multiple sequence alignments. 6) The eFamily project has provided software such as the SPICE DAS client and modifications to ProServer. 7) The E-MSD developed a residue by residue mapping of protein structure to protein sequence.
 
Description Co-applicants on G0100305 
Organisation Medical Research Council (MRC)
Department MRC Centre for Protein Engineering
Country United Kingdom 
Sector Academic/University 
PI Contribution The work was designed to add value to the core functions of the nucleic acid sequence, protein sequence, protein motif and structure databases. By building on these resources and integrating data from other sources, this proposal will also provide an essential portal for the exploitation of genomic and proteomic data by researchers in the UK and world-wide, in both the academic and industrial domains.
Collaborator Contribution Co-applicant on this workCo-applicant on projectCo-applicant on this work
Impact The achievements of the eFamily project have arisen from basic research. The major achievements have been to build important new data sources, write new software, and make existing data available in new ways to the scientific community. Notable achievements are listed below: 1) Setting up of the DAS registry (http://www.dasregistry.org/) - 267 servers are registered from 41 institutions in 16 countries. 2) We have made data available from each database via DAS. 3) The iPfam database of protein-protein interactions of known structure was set up. 4) The Sisyphus database of structural alignments of non-trivial relationships provides a high quality resource of alignments of difficult to align families. 5) The eFamily project has contributed to the DAS specifications (http://www.biodas.org/wiki/Main_Page) for protein interaction and multiple sequence alignments. 6) The eFamily project has provided software such as the SPICE DAS client and modifications to ProServer. 7) The E-MSD developed a residue by residue mapping of protein structure to protein sequence.
 
Description Co-applicants on G0100305 
Organisation The Wellcome Trust Sanger Institute
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution The work was designed to add value to the core functions of the nucleic acid sequence, protein sequence, protein motif and structure databases. By building on these resources and integrating data from other sources, this proposal will also provide an essential portal for the exploitation of genomic and proteomic data by researchers in the UK and world-wide, in both the academic and industrial domains.
Collaborator Contribution Co-applicant on this workCo-applicant on projectCo-applicant on this work
Impact The achievements of the eFamily project have arisen from basic research. The major achievements have been to build important new data sources, write new software, and make existing data available in new ways to the scientific community. Notable achievements are listed below: 1) Setting up of the DAS registry (http://www.dasregistry.org/) - 267 servers are registered from 41 institutions in 16 countries. 2) We have made data available from each database via DAS. 3) The iPfam database of protein-protein interactions of known structure was set up. 4) The Sisyphus database of structural alignments of non-trivial relationships provides a high quality resource of alignments of difficult to align families. 5) The eFamily project has contributed to the DAS specifications (http://www.biodas.org/wiki/Main_Page) for protein interaction and multiple sequence alignments. 6) The eFamily project has provided software such as the SPICE DAS client and modifications to ProServer. 7) The E-MSD developed a residue by residue mapping of protein structure to protein sequence.
 
Description integration of Pfam domain family data with CATH structural family in the Gene3D resource established at UCL 
Organisation National Institutes of Health (NIH)
Department National Institute of General Medical Sciences (NIGMS)
Country United States 
Sector Charity/Non Profit 
PI Contribution The integration of Pfam domain family data with CATH structural family in the Gene3D resource established at UCL, enabled a collaboration between the Orengo Group and the NIH funded Structural Genomics Initiatives in the States (Protein Structure Initiative (PSI)).
Collaborator Contribution CATH structural family in the Gene3D resource established at UCLContribution to the US Protein Structure Initiative (PSI)
Impact By combining these resources, domain family coverage of sequences in 240 completed genomes was considerably extended providing a more comprehensive integrated resource that enabled comparative genome analysis to identify domain families that are highly populated in the genomes but for which there are currently no known structural representatives. Using these approaches several hundred of the largest structurally uncharacterised Pfam families were selected automatically for structure determination. In addition some putative new Pfam domain families were identified by Orengo at UCL and other bioinformatics groups working on target selection for PSI. These families were subsequently validated by Bateman at Sanger before being accepted for structure determination. Contribution of Pfam and new family domains, identified by the Gene3D analysis to the PSI initiative has significantly helped in targeting new areas of structure space and the determination of these structures will substantially increase our knowledge of domain families and the evolution of structures and functions within them.
 
Description integration of Pfam domain family data with CATH structural family in the Gene3D resource established at UCL 
Organisation University College London
Department Department of Statistical Science
Country United Kingdom 
Sector Academic/University 
PI Contribution The integration of Pfam domain family data with CATH structural family in the Gene3D resource established at UCL, enabled a collaboration between the Orengo Group and the NIH funded Structural Genomics Initiatives in the States (Protein Structure Initiative (PSI)).
Collaborator Contribution CATH structural family in the Gene3D resource established at UCLContribution to the US Protein Structure Initiative (PSI)
Impact By combining these resources, domain family coverage of sequences in 240 completed genomes was considerably extended providing a more comprehensive integrated resource that enabled comparative genome analysis to identify domain families that are highly populated in the genomes but for which there are currently no known structural representatives. Using these approaches several hundred of the largest structurally uncharacterised Pfam families were selected automatically for structure determination. In addition some putative new Pfam domain families were identified by Orengo at UCL and other bioinformatics groups working on target selection for PSI. These families were subsequently validated by Bateman at Sanger before being accepted for structure determination. Contribution of Pfam and new family domains, identified by the Gene3D analysis to the PSI initiative has significantly helped in targeting new areas of structure space and the determination of these structures will substantially increase our knowledge of domain families and the evolution of structures and functions within them.
 
Title Software for Protein Sequence Analysis 
Description See products section 
IP Reference  
Protection Copyrighted (e.g. software)
Year Protection Granted
Licensed Yes
Impact See products section
 
Title Software for Protein Sequence Analysis 
Description New software including Proserver and DAS-lite are available via CPAN: http://www.cpan.org/ 
Type Support Tool - For Fundamental Research
Current Stage Of Development Wide-scale adoption
Year Development Stage Completed 2007
Development Status Under active development/distribution
Impact The results of this grant have an immediate impact because of the high profile nature of the database partners involved. Data, software and services have been released throughout the lifetime of the grant and are already in widespread use in the research community. 
URL http://www.cpan.org/
 
Description 2006 Keck Center Annual Research Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Primary Audience Public/other audiences
Results and Impact eFamily was present as part of an invited talk by Dr. Henrick at the 2006 Keck Center Annual Research Conference "Extraction and Integration of Data in Biosystems"
http://cohesion.rice.edu/centersandinst/gcc/keck_about.cfm?doc_id=10277
Sponsored by the UK Science and Technology, British Consulate-General, Houston and the Keck Center training programs.


NA
Year(s) Of Engagement Activity 2006