From text to pathways: text mining techniques for reconstructing signalling pathways
Lead Research Organisation:
University of Manchester
Department Name: Computer Science
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Publications
Ananiadou S
(2011)
Named entity recognition for bacterial Type IV secretion systems.
in PloS one
Ananiadou S
(2010)
Event extraction for systems biology by text mining the literature.
in Trends in biotechnology
Kano Y
(2009)
U-Compare: share and compare text mining tools with UIMA.
in Bioinformatics (Oxford, England)
Kano Y
(2011)
U-Compare bio-event meta-service: compatible BioNLP event extraction services.
in BMC bioinformatics
Kano Y
(2010)
Text mining meets workflow: linking U-Compare with Taverna.
in Bioinformatics (Oxford, England)
Kemper B
(2010)
PathText: a text mining integrator for biological pathway visualizations.
in Bioinformatics (Oxford, England)
Kolluru B
(2011)
Using workflows to explore and optimise named entity recognition for chemistry.
in PloS one
Kontonasios G
(2011)
Adding text mining workflows as web services to the BioCatalogue
Miwa M
(2013)
A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text.
in Bioinformatics (Oxford, England)
Miwa M.
(2013)
NaCTeM EventMine for BioNLP 2013 CG and PC tasks
in Proceedings of the Annual Meeting of the Association for Computational Linguistics
Description | Linking pathways with textual evidence bridges knowledge with text. Pathways (models) represent biological knowledge but are constructed manually and cannot be updated easily. Most of the information about reactions and their modifications can be found in the literature. Text mining systems support this by finding the evidence from text to support the update of these pathways. |
Exploitation Route | The Korea Institute of Science and Technology Information (KISTI) used our PathText system used our results for a follow up funding http://www.nactem.ac.uk/kisti/ We are also part of a follow up grant by DARPA for cancer pathways http://www.nactem.ac.uk/big_mechanism/. |
Sectors | Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology |
URL | http://www.nactem.ac.uk/pathtext/ |
Description | New software has been developed: PathText2 which links pathway reactions to evidence from the literature. PathText2 reads SBML models and Celldesigner semantic types and maps reactions to textual bimolecular events. Utilises results of automatic event extraction, EventMine. Best performing system for pathway curation (BioNLP shared task, 2013). PathText2 is freely available at http://www.nactem.ac.uk/pathtext2 It supports manual curation of pathway models by automatically returning relevant literature. Tools: EventMine for pathway curation. A domain adaptable event extraction text mining tool. Annotated data for training (pathway curation shared task organised at BioNLP ACL 2013) As part of the Garuda alliance led by Prof. Kitano (http://www.garuda-alliance.org) together with other leading research groups we support researchers with expertise in simulations, pathway visualisation, databases and analysis. The Garuda platform and gadgets have been presented by Prof. Kitano (leader of the Japanese team) in exhibitions and open days internationally. A follow-up application funded by DARPA (Cancer Mechanisms) http://www.nactem.ac.uk/big_mechanism/ consolidated this research. |
First Year Of Impact | 2015 |
Sector | Healthcare,Pharmaceuticals and Medical Biotechnology |
Impact Types | Societal Economic |
Description | Big Science Mechanism |
Amount | £678,153 (GBP) |
Funding ID | W911NF-14-1-0333 |
Organisation | Defense Advanced Research Projects Agency (DARPA) |
Sector | Public |
Country | United States |
Start | 11/2014 |
End | 05/2017 |
Description | Methods for real-time intelligence on graphene enterprise development and commercialization |
Amount | £45,000 (GBP) |
Funding ID | N/A |
Organisation | Nesta |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 04/2013 |
End | 07/2014 |
Description | Pathway project |
Amount | £130,000 (GBP) |
Organisation | Korea Institute of Science and Technology Information (KISTI) |
Sector | Academic/University |
Country | Korea, Republic of |
Start | 03/2012 |
End | 07/2014 |
Title | Argo for Biodiversity |
Description | Argo is an interoperable infrastructure for building and running text-analysis solutions. It facilitates the development of custom text mining workflows from a selection of text mining components. We have augmented Argo to include biodiversity text mining tools. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | Supports the curation of databases, user collaboration, includes numerous (and third party) processing components, allows the creation of text mining workflows. Includes text mining tools for biodiversity. |
URL | http://argo.nactem.ac.uk |
Title | EventMine |
Description | EventMine is a machine learning-based pipeline system, which extracts events from documents that already contain named entity annotations (e.g., genes/proteins, etc.). Given appropriate training data, it can be trained to extract many different types and structures of events. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | Community shared tasks; other research teams improved results Customised to different domains and application areas; Part of the Argo text mining platform http://argo.nactem.ac.uk |
URL | http://www.nactem.ac.uk/EventMine/ |
Title | Mining indirect associations, FACTA+ |
Description | software mining direct and indirect associations from the literature for hypothesis generation |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2011 |
Provided To Others? | Yes |
Impact | Supported pathway citation and ranking of interactions from the literature; supports model development |
URL | http://www.nactem.ac.uk/facta/ |
Title | Anatomical entity mention recognition AnaTEM |
Description | The extended Anatomical Entity Mention corpus (AnatEM) consists of 1212 documents (approx. 250,000 words) manually annotated to identify over 13,000 mentions of anatomical entities. Each annotation is assigned one of 12 granularity-based types such as Cellular component, Tissue and Organ, defined with reference to the Common Anatomy Reference Ontology. The corpus builds in part on two previously introduced resources, AnEM and MLEE. The corpus annotations were created using the brat annotation tool. |
Type Of Material | Database/Collection of data |
Year Produced | 2015 |
Provided To Others? | Yes |
Impact | Embedded in Europe PubMed Central Includes lexical resources, AnatomyTagger, UIMA components |
URL | http://nactem.ac.uk/anatomytagger/ |
Title | BioNLP Shared Task Resources 2013 |
Description | The BioNLP Shared Task (BioNLP-ST) series represents a community-wide trend in text-mining for biology toward fine-grained information extraction (IE). The Pathway Curation (PC) task is a main task of the BioNLP Shared Task 2013. The PC task aims to evaluate the applicability of event extraction systems to support the curation, evaluation and maintenance of biomolecular pathway models and to encourage the further development of methods for these tasks. The Cancer Genetics (CG) task is an information extraction task organized as part of the BioNLP Shared Task 2013. The CG task aims to advance the automatic extraction of information from statements on the biological processes relating to the development and progression of cancer. |
Type Of Material | Database/Collection of data |
Year Produced | 2013 |
Provided To Others? | Yes |
Impact | The BioNLP Shared Task series has been instrumental in encouraging the development of methods and resources for the automatic extraction of bio-processes from text, but efforts within this framework have been almost exclusively focused on molecular and sub-cellular level entities and events. To be relevant to cancer biology, event extraction technology must be generalized to be able to address physical entities entities and processes at higher levels of biological organization, such as cell proliferation, apoptosis, blood vessel development, and organ growth. The CG task aims to advance the development of such event extraction methods and the capacity of automatic analysis of texts on cancer biology. Despite more than a decade of work in biomedical text mining on tasks under headings such as "automatic pathway extraction", natural language processing and information extraction methods have not been widely embraced by biomedical pathway curation communities. Until recently, biomedical domain IE efforts concentrated on simple representations (e.g. physical entity pairs) that were not suf?ciently expressive to address pathway curation, and most work also involved different semantics from those applied in curation efforts. We believe that the structured event representation applied in BioNLP Shared Task main tasks offers many opportunities to make a signi?cant contribution to practical pathway curation efforts. The PC task is proposed as a step toward realizing these opportunities. To assure that the task and its data is relevant to the needs of pathway curation efforts, the PC task defines its extraction targets and their semantics with reference to physical entity and reaction types applied in pathway model standardization efforts and relevant ontologies such as the Systems Biology Ontology (SBO). Further, The corpus texts are selected on the basis of relevance to a selection of pathway models from Panther Pathway DB and BioModels, covering both signaling and metabolic pathways. The texts involve both PubMed publication abstracts and PMC Open Access full-text paper extracts. |
URL | http://2013.bionlp-st.org |
Title | Clinical trials recommender |
Description | Search system for clinical trial development and systematic reviews based on the automatic creation of eligibility criteria |
Type Of Material | Computer model/algorithm |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | Interest from clinical trials teams for experimental therapeutics, model for systematic reviews and public health reviews development |
URL | http://www.nactem.ac.uk/ClinicalTrialProtocols/ |
Title | Metaknowledge corpus |
Description | A corpus of 1000 MEDLINE abstracts manually annotated with events (based on the GENIA ontology) and enriched with scientific discourse information. |
Type Of Material | Database/Collection of data |
Year Produced | 2011 |
Provided To Others? | Yes |
Impact | Annotation of scientific discourse attracted interest from publishers. Improved search in EuropePubMedCentral system. |
URL | http://www.nactem.ac.uk/meta-knowledge/ |
Description | KISTI Pathway |
Organisation | Korea Institute of Science and Technology Information (KISTI) |
Country | Korea, Republic of |
Sector | Academic/University |
PI Contribution | The construction of detailed, machine-readable models of biomolecular pathways is a major goal of systems biology, and hundreds of models capturing the physical entities and reactions involved in various pathways are already available from repositories such as the BioModels Database and the PANTHER Pathway repository. Support biologists by providing Biomedical text mining systems, increasingly capable of creating rich structured representations of information automatically extracted from literature. Such text mining systems open many opportunities for supporting the curation, validation, and updating of pathway models. Building on the PathText text mining integration technology for pathways, text mining systems such as MEDIE, event extraction tools such as EventMine, we are developing methods for identifying literature relevant to specific reactions in pathway models and for automatically analysing documents to extract event structures that capture the full semantics of pathway reactions. |
Collaborator Contribution | Joint proposal of the BioNLP 2013 shared task, biologists from KISTI annotated reactions in a variety of signalling and metabolic pathways |
Impact | http://2013.bionlp-st.org organisation of shared tasks with resources made available to the community |
Start Year | 2012 |
Description | PathText |
Organisation | The Systems Biology Institute |
Country | Japan |
Sector | Charity/Non Profit |
PI Contribution | Providing text mining infrastructure to systems biologists |
Collaborator Contribution | Supplied pathway editor for our text mining platform |
Impact | workshops, training events, tutorials, software, publications Members of the Garuda alliance http://www.garuda-alliance.org/alliancemembers |
Start Year | 2010 |
Title | Acromine Disambiguation |
Description | Automatically disambiguates acronyms into their expanded long forms from text. |
Type Of Technology | Webtool/Application |
Year Produced | 2010 |
Impact | Improved search services by refining query expansion |
URL | http://www.nactem.ac.uk/software/acromine_disambiguation/ |
Title | Argo - collaborative text mining workbench |
Description | Argo is a workbench for building and running text-analysis solutions. It facilitates the development of custom workflows from a selection of elementary analytics. |
Type Of Technology | Webtool/Application |
Year Produced | 2012 |
Impact | Curation of databases and pathways through Workflow Design The web interface allows the user to create complex processing workflows composed of processing components and multiple branching and merging points. User-interactive components, such as Manual Annotation Editor, make the processing of workflows pause and wait for input from the user, processing components, remote processing, user collaboration Top performing system in BioCreative IV user interactive task |
URL | http://argo.nactem.ac.uk |
Title | EventMine |
Description | EventMine is a machine learning-based pipeline system, which extracts events from documents that already contain named entity annotations (e.g., genes/proteins, etc.). Given appropriate training data, it can be trained to extract many different types and structures of events. |
Type Of Technology | Webtool/Application |
Year Produced | 2012 |
Impact | EventMine has been trained on a number of different corpora, and corresponding web services are available. EventMine outperformed on a number of community shared tasks BioNLP 2011 and 2013. It is adaptable to any domain. |
URL | http://www.nactem.ac.uk/EventMine/ |
Title | PathText |
Description | A novel method for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. |
Type Of Technology | Webtool/Application |
Year Produced | 2013 |
Impact | Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. |
URL | http://www.nactem.ac.uk/pathtext2/ |
Description | Invited Speaker, Heidelberg, Scientific Computing for the Improved Diagnosis and Therapy of Sepsis (SCIDATOS) |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This kick off workshop gathered an international audience of experts working on diagnosis of sepsis. I was the only text mining expert in this workshop. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.uni-heidelberg.de/einrichtungen/iwh/bock2016.html |
Description | Keynote speaker |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Industry/Business |
Results and Impact | Evotec's mission https://www.evotec.com/en is to discover and develop highly effective therapeutics and make them globally available to the patients who need them. My talk was related to these aims. |
Year(s) Of Engagement Activity | 2022 |