New computational methods for protein function prediction using structural, binding and sequence data

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: Thornton Group

Abstract

One aspect of my work is to develop statistical methods to analyse data from the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS). Data from this trial consists of up to six annual sets of questionnaires from each participant. The questionnaires assess anxiety, sexual function and acceptability of the screening process. I will model how these three processes change over time, and how they depend on one another. There are several approaches to analysing this type of data, and I will explore the pros and cons of three different methods. The challenge in this work lies in the complexity of the UKCTOCS data-set. I will also use some of these methods to analyse data from the MRC Cognitive Function and Ageing Study.

A second aspect of my work is to develop methods for use in meta-analysis. Meta-analysis refers to the pooling of data from different studies, and is used to provide an overview of the available evidence. I will investigate methods for the meta-analysis of data which consists of times to events. I will also explore the meta-analysis of treatment `networks‘, investigating the relative effects of a set of treatments for a given condition.

Technical Summary

Structural Genomics projects determine a large number of protein structures that have little or no

functional information associated with them. Therefore, there is an increasing need for tools to

analyse and characterise the possible functional attributes of such structures. The main objective of

the proposed research is to investigate, implement and validate new methods for protein function

prediction exploiting the fast growing volume of publicly available structural, binding and sequence

data.

A full suite of protein function prediction tools (ProFunc) based on principles such as having similar

sequence, fold or binding sites has been recently developed at the European Bioinformatics Institute

(EBI). Experience shows that no single method, either sequence-based or structure-based, provides

a high proportion of correct predictions in all cases. Furthermore, methods exploiting binding data

are not currently incorporated to this holistic effort. This is in contrast with the imminent availability

of StARLite, a medicinal chemistry database containing vast bioactivity data, which constitutes an

extremely valuable resource for the development of such methods.

In order to fill this gap, a new method for protein function prediction exploiting StARLite binding

data will be investigated and ultimately integrated with ProFunc using an automatic consensus

protocol. This class of methods are based on the principle that protein binding similar sets of

molecules are likely to have similar biochemical function, which means that their effectiveness is

limited by our ability to predict protein-ligand binding via docking techniques. The candidate has

recently developed a very promising machine learning based improvement to docking. The massive

volume of structural and binding data that will be used for machine learning training, allied with the

availability CREDO?s detailed characterisation of inter-molecular interactions, means that the

methodology will be made even more accurate in the course of the fellowship.

The successful outcome of the proposed research will represent a major advance in our ability to

predict the biochemical function of new structures from Structural Genomics projects. Also, this

investigation is expected to increase our understanding of what combination of structural, binding

and sequence features confer biochemical function. Furthermore, the proposed chemogenomics

studies will improve drug target validation and polypharmacological drug lead profiling. In addition,

improved docking will permit more effective identification of biologically active molecules as a way

to reduce expensive and slow empirical High-Throughput Screening. Overall, the widespread

application of these methodologies would strongly contribute to the understanding and exploitation

of biological systems (e.g. pharmaceutical discovery).

Publications

10 25 50
publication icon
Ain QU (2015) Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. in Wiley interdisciplinary reviews. Computational molecular science

publication icon
Ballester PJ (2011) Ultrafast shape recognition: method and applications. in Future medicinal chemistry

 
Description Royal Society - Nature consultation
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a national consultation
Impact Tomorrow's Giants was a major one-day conference on the future of science, co-hosted by the Royal Society and Nature. Tomorrow's Giants brought together scientists and policy makers to gather a vision of the next 50 years in science and discuss what would be needed to enable academic achievement of the highest quality in the future. Participation at the conference was by application. Over 200 scientists were selected to attend the event to discuss their future alongside a series of eminent speakers and leading decision-makers. More information at http://royalsociety.org/Tomorrows-giants
URL http://royalsociety.org/Tomorrows-giants
 
Description JRF Research Allowance 2011-12
Amount £600 (GBP)
Organisation University of Oxford 
Department Wolfson College
Sector Charity/Non Profit
Country United Kingdom
Start 10/2011 
End 09/2012
 
Description JRF Research Allowance 2012-13
Amount £600 (GBP)
Organisation University of Oxford 
Department Wolfson College
Sector Charity/Non Profit
Country United Kingdom
Start 10/2012 
End 09/2013
 
Description JRF Research Allowance 2013-14
Amount £700 (GBP)
Organisation University of Oxford 
Department Wolfson College
Sector Charity/Non Profit
Country United Kingdom
Start 10/2013 
End 09/2014
 
Description Travel award for EMBO practical course
Amount £700 (GBP)
Organisation European Molecular Biology Organisation 
Sector Charity/Non Profit
Country Germany
Start 09/2010 
End 09/2010
 
Description Travel expenses for invited talk at GSK (Tres Cantos, Madrid, Spain)
Amount £500 (GBP)
Organisation GlaxoSmithKline (GSK) 
Sector Private
Country Global
Start 12/2010 
End 12/2010
 
Title IDOCK web server 
Description This is a multithreaded virtual screening tool for flexible ligand docking. 
Type Of Material Improvements to research infrastructure 
Year Produced 2012 
Provided To Others? Yes  
Impact Researchers can use this technology without expensive maintenance, installation, computers or technical knowledge. 
URL http://istar.cse.cuhk.edu.hk/idock/
 
Title RF-Score software 
Description RF-Score is the first scoring function for molecular docking based on nonparametric machine learning. In a recent study (http://www.ncbi.nlm.nih.gov/pubmed/21591735), RF-Score has been shown to outperform 16 widely used scoring functions. 
Type Of Material Improvements to research infrastructure 
Year Produced 2010 
Provided To Others? Yes  
Impact The release of the RF-Score software is improving the dissemination of these results. 
URL http://www.ebi.ac.uk/%7Epedrob/docs/RF-Score.zip
 
Description Dr Jochen Blumberger (University College London, UK) 
Organisation University College London
Department Department of Physics & Astronomy
Country United Kingdom 
Sector Academic/University 
PI Contribution I led this multi-disciplinary collaboration (I am corresponding author), carried out most of the computational work and wrote the manuscript.
Collaborator Contribution Dr Blumberger model the interactions between putative small molecule bindings and the x-ray structural model of the drug target.
Impact A joint publication: http://rsif.royalsocietypublishing.org/content/early/2012/08/23/rsif.2012.0569.abstract This was a multi-disciplinary collaboration. The disciplines involved were: computational chemistry, structural bioinformatics and drug discovery.
Start Year 2009
 
Description Dr John Mitchell (University of St Andrews, UK) 
Organisation University of St Andrews
Department School of Chemistry St Andrews
Country United Kingdom 
Sector Academic/University 
PI Contribution A subproject within my MRC fellowship is to improve a new class of generic scoring functions for in silico docking that my collaborator and I recently introduced. This collaboration led to a joint publication introducing some important considerations for both the correct understanding of these scoring functions and their experimental validation.
Collaborator Contribution Dr J.B.O.M. helped me to write a paper on this topic.
Impact Since the start of my MRC fellowship, this collaboration has resulted in a joint publication: http://www.ncbi.nlm.nih.gov/pubmed/21591735 There is also a first paper that was published between the offer and the official start of my MRC fellowship: http://www.ncbi.nlm.nih.gov/pubmed/20236947 Yes, this collaboration involves three disciplines: machine learning, chemical informatics and structural bioinformatics.
Start Year 2009
 
Description Dr Jose Luis Rossello (Universidad de las Islas Baleares, Spain) 
Organisation University of the Balearic Islands
Country Spain 
Sector Academic/University 
PI Contribution My role is limited to advise how this new technology could be applied to the area of drug lead identification.
Collaborator Contribution I am co-investigator in a recently awarded Spanish National Plan grant (ref.TEC2011-23113, amount: €55,400). The grant will fund a PhD student to investigate the implementation of faster vector similarity calculations using hardware-implemented stochastic-based processing (conventionally this is done via software). In the context of my MRC research, this new technology could be eventually applied to run molecular similarity techniques in a much faster way.
Impact I am co-investigator in a recently awarded Spanish National Plan grant (ref.TEC2011-23113, amount: €55,400). It is a multi-disciplinary collaboration involving electronic engineering and computer science as well as chemical informatics and computational drug design.
Start Year 2011
 
Description Dr Julio Saez-Rodriguez (EMBL-EBI, UK) 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution I led the methodological part of this collaboration intended to model cancer pharmacogenomics in order to understand how drug efficacy depends on the genome of an individual.
Collaborator Contribution My collaborators contribute with the pre-processing of phenotypic, chemical and genomic data as well as the validation and interpretation of the resulting models.
Impact This is a multi-disciplinary collaboration involving machine learning modelling, chemical informatics, genomics and drug design. There is already a joint publication: http://dx.plos.org/10.1371/journal.pone.0061318 A second study on drug repositioning is on the way. Prospective validation is currently taking place.
Start Year 2012
 
Description Dr Maja Koehn (EMBL-Heidelberg, Germany) 
Organisation European Molecular Biology Laboratory
Department European Molecular Biology Laboratory Heidelberg
Country Germany 
Sector Academic/University 
PI Contribution I have applied the developed ligand-based and structure-based virtual screening tools to discover new drug leads for PRL-3, a research target for colon cancer metastasis.
Collaborator Contribution This lab has expertise in phosphatases and cancer biology; they carry out experimental validations and medicinal chemistry optimisation of the leads.
Impact This is a multi-disciplinary collaboration involving chemical biology, medicinal chemistry and computational drug design. A number of PRL-3 inhibitors have been identified by the developed methods. These are currently being used as proof-of-concept to raise additional funding and the plan is to patent them soon.
Start Year 2010
 
Description Prof Chris Abell (University of Cambridge, UK) 
Organisation University of Cambridge
Department Department of Chemistry
Country United Kingdom 
Sector Academic/University 
PI Contribution I led this multi-disciplinary collaboration, carried out most of the computational work and wrote the manuscript.
Collaborator Contribution Prof Abell's group carried out in vitro assays to validate the prediction of our computational models. This resulted in a high proportion of new inhibitors for the studied anti-TB drug target.
Impact A joint publication: http://rsif.royalsocietypublishing.org/content/early/2012/08/23/rsif.2012.0569.abstract This was a multi-disciplinary collaboration. The disciplines involved were: computational chemistry, structural bioinformatics and drug discovery.
Start Year 2009
 
Description Prof Dame Janet Thornton (EMBL-EBI, UK) 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution I am applying the methods I have developed to predict the strength with which molecules bind to proteins involved in aging using the co-crystallised structures as input.
Collaborator Contribution My collaborators have compiled all these structures including homologues of model organisms and use the predicted strength of binding as an additional criterion to assess how likely is a drug to affect the life-span of c.elegans and drosophila.
Impact This is a recent multi-disciplinary collaboration involving structural bioinformatics, aging biology and machine learning.
Start Year 2013
 
Description Prof Kwong-Sak Leung (CUHK, China) 
Organisation Chinese University of Hong Kong
Department Department of Computer Science and Engineering
Country Hong Kong 
Sector Academic/University 
PI Contribution Providing the docking scoring RF-Score and designing validation of the docking tool that uses it (idock).
Collaborator Contribution They have developed idock and the web server that runs it for prospective virtual screening of any target with a structural model.
Impact This is a multi-disciplinar collaboration involving computer science, computational chemistry and structural bioinformatics.
Start Year 2012
 
Description Prof Sir Tom Blundell (University of Cambridge, UK) 
Organisation University of Cambridge
Department Department of Biochemistry
Country United Kingdom 
Sector Academic/University 
PI Contribution Built and tested machine-learning predictors of binding for structure-based drug design mainly.
Collaborator Contribution Providing detailed chemical descriptions of protein-ligand structural environment necessary for the modelling above.
Impact This is a multi-disciplinary collaboration involving machine learning, structural and drug discovery informatics. Main outputs are a joint paper in review and freely available software (see URL above).
Start Year 2011
 
Description Full-page interview in a spanish newspaper (leading newspaper in the Balearic Islands region with population 1.2 million) 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact I explained my research to a lay audience.
Link to the online edition of the newspaper: http://www.diariodemallorca.es/2012/09/05

People was really interested about my research and felt that now had an idea what I am doing and what impact this kind of research has on their lifes.
Year(s) Of Engagement Activity 2012
 
Description Press release about a study recently published at the Journal of the Royal Society Interface 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Preparing a press release with a journalist in my institute.

http://www.ebi.ac.uk/Information/News/press-releases/research-highlight-30082012-Ballester_Interfaces.html

So far it was viewed by many academic in other fields via twitter and my Cambridge college facebook page. I contacted the MRC press office, but found difficult to materialise their interest in a concrete promotion activity.
Year(s) Of Engagement Activity 2012