From text to pathways: text mining techniques for reconstructing signalling pathways

Lead Research Organisation: University of Manchester
Department Name: Computer Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Publications

10 25 50
 
Description Linking pathways with textual evidence bridges knowledge with text. Pathways (models) represent biological knowledge but are constructed manually and cannot be updated easily. Most of the information about reactions and their modifications can be found in the literature. Text mining systems support this by finding the evidence from text to support the update of these pathways.
Exploitation Route The Korea Institute of Science and Technology Information (KISTI) used our PathText system used our results for a follow up funding http://www.nactem.ac.uk/kisti/

We are also part of a follow up grant by DARPA for cancer pathways http://www.nactem.ac.uk/big_mechanism/.
Sectors Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://www.nactem.ac.uk/pathtext/
 
Description New software has been developed: PathText2 which links pathway reactions to evidence from the literature. PathText2 reads SBML models and Celldesigner semantic types and maps reactions to textual bimolecular events. Utilises results of automatic event extraction, EventMine. Best performing system for pathway curation (BioNLP shared task, 2013). PathText2 is freely available at http://www.nactem.ac.uk/pathtext2 It supports manual curation of pathway models by automatically returning relevant literature. Tools: EventMine for pathway curation. A domain adaptable event extraction text mining tool. Annotated data for training (pathway curation shared task organised at BioNLP ACL 2013) As part of the Garuda alliance led by Prof. Kitano (http://www.garuda-alliance.org) together with other leading research groups we support researchers with expertise in simulations, pathway visualisation, databases and analysis. The Garuda platform and gadgets have been presented by Prof. Kitano (leader of the Japanese team) in exhibitions and open days internationally. A follow-up application funded by DARPA (Cancer Mechanisms) http://www.nactem.ac.uk/big_mechanism/ consolidated this research.
First Year Of Impact 2015
Sector Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic

 
Description Big Science Mechanism
Amount £678,153 (GBP)
Funding ID W911NF-14-1-0333 
Organisation Defense Advanced Research Projects Agency (DARPA) 
Sector Public
Country United States
Start 11/2014 
End 05/2017
 
Description Methods for real-time intelligence on graphene enterprise development and commercialization
Amount £45,000 (GBP)
Funding ID N/A 
Organisation Nesta 
Sector Charity/Non Profit
Country United Kingdom
Start 05/2013 
End 07/2014
 
Description Pathway project
Amount £130,000 (GBP)
Organisation Korea Institute of Science and Technology Information (KISTI) 
Sector Academic/University
Country Korea, Republic of
Start 03/2012 
End 07/2014
 
Title Argo for Biodiversity 
Description Argo is an interoperable infrastructure for building and running text-analysis solutions. It facilitates the development of custom text mining workflows from a selection of text mining components. We have augmented Argo to include biodiversity text mining tools. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact Supports the curation of databases, user collaboration, includes numerous (and third party) processing components, allows the creation of text mining workflows. Includes text mining tools for biodiversity. 
URL http://argo.nactem.ac.uk
 
Title EventMine 
Description EventMine is a machine learning-based pipeline system, which extracts events from documents that already contain named entity annotations (e.g., genes/proteins, etc.). Given appropriate training data, it can be trained to extract many different types and structures of events. 
Type Of Material Improvements to research infrastructure 
Year Produced 2012 
Provided To Others? Yes  
Impact Community shared tasks; other research teams improved results Customised to different domains and application areas; Part of the Argo text mining platform http://argo.nactem.ac.uk 
URL http://www.nactem.ac.uk/EventMine/
 
Title Mining indirect associations, FACTA+ 
Description software mining direct and indirect associations from the literature for hypothesis generation 
Type Of Material Improvements to research infrastructure 
Year Produced 2011 
Provided To Others? Yes  
Impact Supported pathway citation and ranking of interactions from the literature; supports model development 
URL http://www.nactem.ac.uk/facta/
 
Title Anatomical entity mention recognition AnaTEM 
Description The extended Anatomical Entity Mention corpus (AnatEM) consists of 1212 documents (approx. 250,000 words) manually annotated to identify over 13,000 mentions of anatomical entities. Each annotation is assigned one of 12 granularity-based types such as Cellular component, Tissue and Organ, defined with reference to the Common Anatomy Reference Ontology. The corpus builds in part on two previously introduced resources, AnEM and MLEE. The corpus annotations were created using the brat annotation tool. 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact Embedded in Europe PubMed Central Includes lexical resources, AnatomyTagger, UIMA components 
URL http://nactem.ac.uk/anatomytagger/
 
Title BioNLP Shared Task Resources 2013 
Description The BioNLP Shared Task (BioNLP-ST) series represents a community-wide trend in text-mining for biology toward fine-grained information extraction (IE). The Pathway Curation (PC) task is a main task of the BioNLP Shared Task 2013. The PC task aims to evaluate the applicability of event extraction systems to support the curation, evaluation and maintenance of biomolecular pathway models and to encourage the further development of methods for these tasks. The Cancer Genetics (CG) task is an information extraction task organized as part of the BioNLP Shared Task 2013. The CG task aims to advance the automatic extraction of information from statements on the biological processes relating to the development and progression of cancer. 
Type Of Material Database/Collection of data 
Year Produced 2013 
Provided To Others? Yes  
Impact The BioNLP Shared Task series has been instrumental in encouraging the development of methods and resources for the automatic extraction of bio-processes from text, but efforts within this framework have been almost exclusively focused on molecular and sub-cellular level entities and events. To be relevant to cancer biology, event extraction technology must be generalized to be able to address physical entities entities and processes at higher levels of biological organization, such as cell proliferation, apoptosis, blood vessel development, and organ growth. The CG task aims to advance the development of such event extraction methods and the capacity of automatic analysis of texts on cancer biology. Despite more than a decade of work in biomedical text mining on tasks under headings such as "automatic pathway extraction", natural language processing and information extraction methods have not been widely embraced by biomedical pathway curation communities. Until recently, biomedical domain IE efforts concentrated on simple representations (e.g. physical entity pairs) that were not suf?ciently expressive to address pathway curation, and most work also involved different semantics from those applied in curation efforts. We believe that the structured event representation applied in BioNLP Shared Task main tasks offers many opportunities to make a signi?cant contribution to practical pathway curation efforts. The PC task is proposed as a step toward realizing these opportunities. To assure that the task and its data is relevant to the needs of pathway curation efforts, the PC task defines its extraction targets and their semantics with reference to physical entity and reaction types applied in pathway model standardization efforts and relevant ontologies such as the Systems Biology Ontology (SBO). Further, The corpus texts are selected on the basis of relevance to a selection of pathway models from Panther Pathway DB and BioModels, covering both signaling and metabolic pathways. The texts involve both PubMed publication abstracts and PMC Open Access full-text paper extracts. 
URL http://2013.bionlp-st.org
 
Title Clinical trials recommender 
Description Search system for clinical trial development and systematic reviews based on the automatic creation of eligibility criteria 
Type Of Material Computer model/algorithm 
Year Produced 2012 
Provided To Others? Yes  
Impact Interest from clinical trials teams for experimental therapeutics, model for systematic reviews and public health reviews development 
URL http://www.nactem.ac.uk/ClinicalTrialProtocols/
 
Title Metaknowledge corpus 
Description A corpus of 1000 MEDLINE abstracts manually annotated with events (based on the GENIA ontology) and enriched with scientific discourse information. 
Type Of Material Database/Collection of data 
Year Produced 2011 
Provided To Others? Yes  
Impact Annotation of scientific discourse attracted interest from publishers. Improved search in EuropePubMedCentral system. 
URL http://www.nactem.ac.uk/meta-knowledge/
 
Description KISTI Pathway 
Organisation Korea Institute of Science and Technology Information (KISTI)
Country Korea, Republic of 
Sector Academic/University 
PI Contribution The construction of detailed, machine-readable models of biomolecular pathways is a major goal of systems biology, and hundreds of models capturing the physical entities and reactions involved in various pathways are already available from repositories such as the BioModels Database and the PANTHER Pathway repository. Support biologists by providing Biomedical text mining systems, increasingly capable of creating rich structured representations of information automatically extracted from literature. Such text mining systems open many opportunities for supporting the curation, validation, and updating of pathway models. Building on the PathText text mining integration technology for pathways, text mining systems such as MEDIE, event extraction tools such as EventMine, we are developing methods for identifying literature relevant to specific reactions in pathway models and for automatically analysing documents to extract event structures that capture the full semantics of pathway reactions.
Collaborator Contribution Joint proposal of the BioNLP 2013 shared task, biologists from KISTI annotated reactions in a variety of signalling and metabolic pathways
Impact http://2013.bionlp-st.org organisation of shared tasks with resources made available to the community
Start Year 2012
 
Description PathText 
Organisation The Systems Biology Institute
Country Japan 
Sector Charity/Non Profit 
PI Contribution Providing text mining infrastructure to systems biologists
Collaborator Contribution Supplied pathway editor for our text mining platform
Impact workshops, training events, tutorials, software, publications Members of the Garuda alliance http://www.garuda-alliance.org/alliancemembers
Start Year 2010
 
Title Acromine Disambiguation 
Description Automatically disambiguates acronyms into their expanded long forms from text. 
Type Of Technology Webtool/Application 
Year Produced 2010 
Impact Improved search services by refining query expansion 
URL http://www.nactem.ac.uk/software/acromine_disambiguation/
 
Title Argo - collaborative text mining workbench 
Description Argo is a workbench for building and running text-analysis solutions. It facilitates the development of custom workflows from a selection of elementary analytics. 
Type Of Technology Webtool/Application 
Year Produced 2012 
Impact Curation of databases and pathways through Workflow Design The web interface allows the user to create complex processing workflows composed of processing components and multiple branching and merging points. User-interactive components, such as Manual Annotation Editor, make the processing of workflows pause and wait for input from the user, processing components, remote processing, user collaboration Top performing system in BioCreative IV user interactive task 
URL http://argo.nactem.ac.uk
 
Title EventMine 
Description EventMine is a machine learning-based pipeline system, which extracts events from documents that already contain named entity annotations (e.g., genes/proteins, etc.). Given appropriate training data, it can be trained to extract many different types and structures of events. 
Type Of Technology Webtool/Application 
Year Produced 2012 
Impact EventMine has been trained on a number of different corpora, and corresponding web services are available. EventMine outperformed on a number of community shared tasks BioNLP 2011 and 2013. It is adaptable to any domain. 
URL http://www.nactem.ac.uk/EventMine/
 
Title PathText 
Description A novel method for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. 
Type Of Technology Webtool/Application 
Year Produced 2013 
Impact Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. 
URL http://www.nactem.ac.uk/pathtext2/
 
Description Invited Speaker, Heidelberg, Scientific Computing for the Improved Diagnosis and Therapy of Sepsis (SCIDATOS) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This kick off workshop gathered an international audience of experts working on diagnosis of sepsis. I was the only text mining expert in this workshop.
Year(s) Of Engagement Activity 2016
URL http://www.uni-heidelberg.de/einrichtungen/iwh/bock2016.html
 
Description Keynote speaker 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact Evotec's mission https://www.evotec.com/en is to discover and develop highly effective therapeutics and make them globally available to the patients who need them. My talk was related to these aims.
Year(s) Of Engagement Activity 2022