Japan Partnering Award. Text mining and bioinformatics platforms for metabolic pathway modelling.

Lead Research Organisation: University of Manchester
Department Name: Computer Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
 
Description We have brought together text mining platforms (Argo and other resources from OpenMinted) and applied them to data curation and metabolic modelling
We use text mining methods to support the curation of an enzyme reaction database EzCatDB (Japan), containing a reaction classification (Reaction, Ligand, Catalysis and Protein active-site), important for metabolic modelling.
We trained Japanese researchers in using the text mining and bioinformatics platforms.
Exploitation Route The integration of text mining platforms and workflows are crucial for the curation of databases and will be used by other teams (e.g. RIKEN, AIST, etc). The development of a semantic search system from the whole of PubMed supports discovery and the development of an annotation environment model development
Sectors Chemicals,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://nactem-copious.man.ac.uk/Thalia/thalia.html
 
Description Our findings have been used to populate the EzyCat database using a combination of novel annotation environment for model training. The database is being developed in Japan, Artificial Intelligence Research Centre.
First Year Of Impact 2018
Sector Chemicals,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title APLenty: annotation tool for creating high-quality datasets using active and proactive learning 
Description APLenty, is an annotation tool for creating high-quality sequence labeling datasets using active and proactive learning. A major innovation of this tool is the integration of automatic annotation with active learning and proactive learning. This makes the task of creating labeled datasets easier, less time-consuming and requiring less human effort. APLenty is highly flexible and can be adapted to various tasks such as database curation and information extraction 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact Several groups are using it to create labelled data for training 
URL http://www.nactem.ac.uk/aplenty/
 
Title Corpus annotations for pharmacovigilance 
Description The PHAEDRA corpus is a semantically annotated corpus for pharmacovigilence (PV), consisting of 597 MEDLINE abstracts. Its fine-grained, multiple levels of annotation, added by domain-experts, make it a unique resource within the field, and aim to encourage the development/adaption of novel machine learning tools for extracting PV-related information from text. It is intended that such tools will lead to novel means of supporting curators to efficiently increase the coverage, consistency and completeness of the information in PV resources. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact A unique resource within the field of pharmacovigilance which encourages the development/adaption of novel machine learning tools for extracting PV-related information from text. It is currently used to extract information for the development of the Enzyme database in Japan (AIRC) 
URL http://www.nactem.ac.uk/PHAEDRA/
 
Title APLenty is an annotation tool developed at NaCTeM for creating high-quality sequence labeling datasets using active and proactive learning. 
Description A major innovation of our tool is the integration of automatic annotation with active learning and proactive learning. This makes the task of creating labeled datasets easier, less time-consuming and requiring less human effort. APLenty is highly flexible and can be adapted to various other tasks. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact Cutting annotation costs for developing gold standards; supporting crowdsourcing efforts by selecting the most reliable annotator 
URL http://www.nactem.ac.uk/aplenty/about
 
Title Thalia: a faceted semantic search system 
Description The main purpose of Thalia is to enable semantic search in the context of biomedical literature by leveraging previous named entity (NE) annotation efforts. The key strategy to achieve a semantic behaviour is to normalise NEs, i.e., linking entities to concepts in an openly available ontology, which effectively allows to map a concept with its multiple word forms. Thalia covers the entire PubMed, which at the point of this challenge contains about 27 million references. Thalia includes annotations of several types (Chemicals, Diseases, Drugs, Genes, Metabolites, Proteins, Species and Anatomic entities). 
Type Of Technology Webtool/Application 
Year Produced 2017 
Open Source License? Yes  
Impact The semantic search system has been used to support a precision medicine challenge, retrieving documents containing potential treatments and clinical trials for specific patient characteristics. 
URL http://nactem-copious.man.ac.uk/Thalia/thalia.html
 
Description 10th International Biocuration Conference, Stanford University 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Co-organiser of the workshop focusing on on recent advances in the development of integrated systems to capture mechanism for biological systems, including machine reading of journal articles, (semi-)automated assembly of signaling pathway models, and machine-aided analysis of these models for tasks such as drug repurposing and explaining drugs' effects. This workshop consisted of invited speakers and contributed talks and/or panel discussions from experts in biocuration, machine reading, and biological modeling.
Year(s) Of Engagement Activity 2017
URL https://f1000research.com/slides/6-482
 
Description Invited Talk Artificial Intelligence Research Centre, AIST Japan 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation of the text mining infrastructure to support the automatic reconstruction of pathways and their curation.
Year(s) Of Engagement Activity 2017
 
Description Invited talk at Aberystwyth University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact 50 PGR students attended the talk which sparked questions about the role of AI in pathways
Year(s) Of Engagement Activity 2019
URL https://www.eventbrite.co.uk/e/enriching-pathway-models-using-text-mining-mid-wales-branch-registrat...
 
Description Keynote International Symposium on Information Management and Big Data (SIMBig 2019),Lima Peru 21st - 23rd August 2019. Prof. Ananiadou's talk wasentitled Text Mining for Biomedical Applications. 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Dissemination of text mining research supporting systematic reviews to a wider audience
Year(s) Of Engagement Activity 2019
URL https://simbig.org/SIMBig2019/index.html
 
Description Keynote Speaker University of York 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact This event celebrated the launch of the University's new high performance compute (HPC) cluster, 'Viking', which promises to empower researchers at York in achieving new heights of research excellence. My talk discussed how text mining needs HPC clusters.
Year(s) Of Engagement Activity 2019
URL https://www.york.ac.uk/it-services/research-computing/vikingclusterlaunchevent/
 
Description Keynote speaker CLEF eHealth 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The keynote was concerned how text mining can link cancer pathway models with textual evidence to automate science for drug discovery in cancer research. Text mining techniques are being employed to construct, update and verify information in relevant models, to ensure that the information used for hypothesis generation is as accurate as possible. Complex information from the literature (semantic events) are automatically extracted and mapped/compared to reactions in existing pathway models.

These comparisons allow the existing models to be verified or updated in several ways. Information from the literature can act as corroborative evidence of the validity of these reactions in a model or help to extend it. In addition, by taking into account textual context (uncertainty, negation), we can provide a confidence measure for linking and ranking evidence from the literature for model curation and experimental design.
Year(s) Of Engagement Activity 2018
URL https://sites.google.com/view/clef-ehealth-2018/home
 
Description Keynote talk at ISCB/ECCB session on Text Mining for Biology and Healthcare 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact ISMB/ECCB 2019 is the largest and most high profile annual meeting of scientists working in computational biology and provides an intense multidisciplinary forum for disseminating the latest developments in computational tools for data driven biological research.
Year(s) Of Engagement Activity 2019
URL https://www.iscb.org/ismbeccb2019-program/special-sessions#sst01
 
Description Machine reading for cancer biology at the Global Pharma R&D Informatics Congress 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This congress looks at new methods and new technologies that get the best out of the information available and
strategies to integrate internal and external systems so that all teams get the information they need to accelerate
the drug development pipeline. Attracting experts working in all areas of pharmaceutical R&D IT and discovery
informatics, the event focused on innovations and strategies in these 4 key topic areas:
• Complex Data Analytics
• System Integration
• AI and Machine Learning
• Data Storage and Management
Year(s) Of Engagement Activity 2017
URL http://www.global-engage.com/wp-content/uploads/2017/07/Global-Pharma-RD-Informatics-Congress-Europe...
 
Description Speaker Elsevier forum 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Engagement with Elsevier about text mining for enriching their content and discussions on collaboration
Year(s) Of Engagement Activity 2019
 
Description Speaker Google London 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Collaboration with Google about Biomedical text Mining. Ongoing discussions
Year(s) Of Engagement Activity 2019
 
Description Workshop on Biomedical Information Management, Hamburg Germany 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The workshop gathered interested professionals from across Europe working in the information or medical domain, such as medical researchers, medical doctors and entrepreneurs building their business around biomedical ICT. The goal of the workshop was to obtain answers to the following questions:
What information and knowledge management solutions are actively used in the community?
What are limitations of the current solutions?
What important problems in biomedical information management are not addressed and automatized at all by any solution?
Year(s) Of Engagement Activity 2018
URL http://www.bimdanube.eu/ws1/