Enriching Metabolic PATHwaY models with evidence from the literature (EMPATHY)

Lead Research Organisation: University of Manchester

Department Name: Computer Science

Abstract

In order to understand living systems, biologists have taken to generating predictive models of the system, allowing them to run computational experiments that reduce the number of more traditional, lab-based experiments that would previously be necessary to gain such an understanding. This approach follows that which is now commonplace in engineering, in which, for instance, aeronautical engineers will develop sophisticated models of aircraft and test safety aspects of the proposed design in a computer, long before developing the aircraft itself (or even putting it in a wind tunnel).

This biological modelling approach is named "systems biology" and has been employed successfully in a number of areas. The focus of this proposal is in modelling metabolism. Metabolism is the collection of interconnected chemical reactions that allow cells to extract energy and material from the nutrients that they consume and to grow. All free-living organisms necessarily have such metabolic systems. Thus, modelling human metabolism will allow us to understand the human body's healthy state, for instance as a function of ageing, and aid in the design of chemicals (whether nutrients or drugs) that can maintain human health.

In a similar vein, metabolic modelling is also being used in the development of cell factories, which are able to produce industrially relevant chemicals, which are commonly produced by the chemical industry through more traditional means, and often involve the use of oil as a feedstock. This approach (known as fermentation or "industrial biotechnology") is not new - we have been fermenting yeast cells to produce alcohol for thousands of years - but traditional fermentation improvements, lasting decades in the case of penicillins, involved random mutation and selection, often coupled to the incorporation of harmful 'passenger' mutations. However, recent research has shown that metabolic network modelling methods provide a rational approach, both for mature fermentations and for new ones such as bio-isoprene for sustainable car tyre production. Thus, these methods have great value for the sustainable bioproduction of important substances, such as biofuels and fine chemicals.

Metabolic modelling therefore has much promise for health and environmental sustainability in this coming century. However, much of the information necessary for the building of these models is held in textbooks, patents and scientific journals, and large teams of researchers are required to search for, judge and extract this information before including it in the models. Thus, the traditional development of such models currently follows (and requires) a time consuming and expensive manual process. Modern methods allow this to be automated.

This process of extracting information from the literature can be greatly facilitated by the application of the methods of text mining. Text mining applies sophisticated algorithms to recognise relevant terms and sentences buried in text, and can be trained to recognise those passages of text within a large number of documents that may be relevant to a given application.

In this work, we will utilise text mining to extract information necessary for the construction of metabolic network models from the large number of scientific articles that are published daily. The results of these analyses will be presented to model developers, who will judge and extract this information to develop existing metabolic models further. A specific easy-to-use web application will be developed in order to allow a multiple users to contribute towards this model building process, irrespective of their background and previous experience of computational model building.

The results of this work will be more complete metabolic models, which will allow researchers to improve understanding of metabolism in a range of organisms, and therefore use this increased knowledge in applications of health and environmental sustainability.

Technical Summary

Recently, the development of genome-scale metabolic models, and their analysis through constraint-based modelling, has increased dramatically, and has been applied to research in human health, drug discovery and biotechnology. These models provide computational and mathematical representations of metabolism in a wide range of organisms, allowing in silico predictions of metabolic processes.

Our recent work has automated the construction of draft models for over 2600 organisms from pathway databases. While providing a valuable starting point, these draft models require further manual curation, as current databases lack the coverage of metabolism required to produce detailed, predictive models. The curation process continues to be a time consuming and expensive affair, driven by the need to extract manually the missing details of metabolic processes from literature. Recent reconstruction efforts that have led to high quality models - such as those that we undertook in the development of yeast and human consensus metabolic networks - are heavily reliant on manual mining of literature.

This project attempts to reduce greatly the time and expense devoted to manual literature mining by developing infrastructure to support literature-driven model construction. This will be achieved by the introduction of an integrated model development environment to enable users to undertake this process.

Crucial to this proposal is the integration of bespoke text mining approaches, which will extract relevant passages from publications, and present them to developers as they expand and refine models. In addition to supporting the model development process itself, users will also be able to provide feedback on text mining suggestions, which will be used to improve their relevance. The task of generating large-scale models for any given organism, including the extraction of biochemical knowledge from literature, will thereby become closer to one which we can fully automate.

Planned Impact

Although a variety of genome-scale metabolic networks exist, even the most mature are very far from being complete. This project will facilitate the development of genome-scale metabolic reconstructions through the use of advanced text mining approaches. It requires software to support the activity, which will also be created in this project.

Who will benefit?

The beneficiaries of this research are scientists and teams of scientists that use the computational modelling of metabolism as part of their research. This includes academic researchers, research students, and scientists from industries such as pharmaceuticals, biotechnology, agriculture, cosmetics, health, and fermentation and industrial biotechnology generally. The outcome of the work, enhanced metabolic network reconstructions of a host of organisms, will benefit research in a range of fields, including human health and ageing, and biotechnological approaches to the development of biofuels and high-value chemicals.

How will they benefit?

The benefits from the outputs of this research will impact the way in which the beneficiaries carry out modelling of metabolic networks to perform in silico experiments. This is an important part of all systems approaches. The actual reconstruction of human metabolism will benefit pharma, as it will help in identifying targets for new drugs. Since we will have developed an improved map of human metabolism, with specificity to various cell types, this will be an invaluable resource for the replacement, refinement and reduction of research using animals (3Rs), where computational modelling can be carried out relative to human cells rather than laboratory animals - this is extremely important to the cosmetic industry given the EU-wide ban that is now in place. The public at large will also benefit from the enhanced metabolic reconstructions, as they will provide an avenue to develop personalised nutrition, exercise and other aspects of a healthy life.

While many 'chassis' organisms are being developed for various aspects of industrial biotechnology, because of the tools available Saccharomyces cerevisiae and Escherichia coli will continue to play major roles. We thus intend to ensure that we drive developments in these organisms in particular.

As well as these, the software that will be developed will allow researchers to create metabolic reconstructions of any organism of interest. The software will allow for distributed (online) network reconstruction jamborees, and may be used to coordinate community model construction efforts for those organisms. UK industries adopting the metabolic reconstructions will increase their research effectiveness and thus this will contribute to their competitiveness (such as the IPA partner Unilever). We note that this resource will be of special interest to the household products and cosmetic industries since they are now banned from using laboratory animals to develop their products and to test their effectiveness and potential toxicity. Improved accurate metabolic networks of a range of organisms are an invaluable resource for that purpose.

Funded Value:

£593,909

Funded Period:

Apr 15 - Mar 19

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/M006891/1

Principal Investigator:

Sophia Ananiadou

Pedro Mendes

Research Subject:

Biomolecules & biochemistry (33%)

Info. & commun. Technol. (66%)

Research Topic:

Artificial Intelligence (66%)

Biochemistry & physiology (33%)

Organisations

People	ORCID iD
Sophia Ananiadou (Principal Investigator)	http://orcid.org/0000-0002-4097-9191
Pedro Mendes (Principal Investigator)
Douglas Kell (Co-Investigator)
Rafal Rak (Researcher Co-Investigator)
Neil Swainston (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Ananiadou S (2017) Data Analytics and Management in Data Intensive Domains

Batista-Navarro R (2016) Argo: enabling the development of bespoke workflows and services for disease annotation. in Database : the journal of biological databases and curation

Cao J (2021) GenerativeRE: Incorporating a Novel Copy Mechanism and Pretrained Model for Joint Entity and Relation Extraction

Christopoulou F (2020) Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods. in Journal of the American Medical Informatics Association : JAMIA

Christopoulou F (2018) A Walk-based Model on Entity Graphs for Relation Extraction

Darbani B (2018) Energetic evolution of cellular Transportomes in BMC Genomics

Dobson P (2017) A Metabolic Reaction Balancing Web Service for Computational Systems Biology

Ju M (2020) An ensemble of neural models for nested adverse drug events and medication extraction with subwords. in Journal of the American Medical Informatics Association : JAMIA

Ju, M. (2018) Conference Paper

Mendes P (2015) Fitting Transporter Activities to Cellular Drug Concentrations and Fluxes: Why the Bumblebee Can Fly. in Trends in pharmacological sciences

Nghiem, M.Q (2018) APLenty: annotation tool for creating high-quality datasets using active and proactive learning

Przybyla P (2016) Text mining resources for the life sciences. in Database : the journal of biological databases and curation

Shardlow M (2018) Identification of research hypotheses and new knowledge from scientific literature. in BMC medical informatics and decision making

Shardlow, M. (2018) A New Corpus to Support Text Mining for the Curation of Metabolites in the ChEBI Database

Soto AJ (2018) LitPathExplorer: a confidence-based visual text analytics tool for exploring literature-enriched pathway models. in Bioinformatics (Oxford, England)

Soto AJ (2019) Thalia: semantic search engine for biomedical abstracts. in Bioinformatics (Oxford, England)

Swainston N (2017) biochem4j: Integrated and extensible biochemical knowledge through graph databases. in PloS one

Swainston N (2018) STRENDA DB: enabling the validation and sharing of enzyme kinetics data. in The FEBS journal

Swainston N (2016) Recon 2.2: from reconstruction to model of human metabolism. in Metabolomics : Official journal of the Metabolomic Society

Swainston N (2016) libChEBI: an API for accessing the ChEBI database. in Journal of cheminformatics

Thompson P (2018) Annotation and detection of drug effects in text for pharmacovigilance. in Journal of cheminformatics

Thompson P (2017) Handbook of Linguistic Annotation

Trieu HL (2020) DeepEventMine: end-to-end neural nested event extraction from biomedical texts. in Bioinformatics (Oxford, England)

Zerva C (2017) Using uncertainty to link and rank evidence from biomedical literature for model curation. in Bioinformatics (Oxford, England)

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Description	We produced tools and databases to support genome-scale reconstruction of human metabolism. EMPATHY tools were used to standardise gene associations, which in earlier versions were jumbled and often meaningless, to point to official HGNC approved symbols. This enables better analyses using omics data. Other EMPATHY contributions include the incorporation of inputs from laboratories across the World and the improved representation of energy metabolism such that, for the first time, the reconstruction functions as a constraint-based model in terms of simulating ATP yield on different carbon sources. The resolution of this longstanding error represents a major advance. Text mining tools extract automatically evidence from massive textual data to support pathway reconstruction using novel platforms. We have now linked text mining tools which extract relations between reactants and products automatically not only from sentences but across sentences. We developed new deep learning tools to do this and linked our text evidence with neochem4j database.
Exploitation Route	The search system is linked with Recon 2.2 now represents the most predictive model of human metabolism to date as demonstrated here. Extensive manual curation has increased the reconstruction size to 5324 metabolites, 7785 reactions and 1675 associated genes, which now are mapped to a single standard. The focus upon mass and charge balancing of all reactions, along with better representation of energy generation, has produced a flux model that correctly predicts ATP yield on different carbon sources. The research methods have been used by two subsequent projects: EPHOR (H2020) https://cordis.europa.eu/project/id/874703 (exposome for health and occupational research) and Japan Agency for Medical Research and Development.
Sectors	Chemicals,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
URL	http://biochem4j.org/browser/


Description	Empathy is a project about making it easier for biochemistry experts to build metabolic models. The challenge lies in automating information discovery and integration so experts can focus upon checking the model correctly represents biochemistry. We discover metabolic information from our metabolic knowledge base Biochem4j http://biochem4j.org/browser/ The scientific literature through Text Mining. Biochem4j is a graph database (Neo4j) that integrates multiple metabolic and associated databases. Text Mining uses relation extraction to retrieve elements of a reaction with its role from the unstructured text. The knowledge base derived from text mining (deep learning) is linked with Biochem4j. Our findings are used by Unilever (our partner) and other teams working on Synthetic Biology. In addition, our findings are now used by the Artificial Intelligence Research Centre in Japan for database curation and Drug Discovery Subsequent funding was obtained by Japan Agency for Medical Research and Development and H2020 EPHOR project
First Year Of Impact	2017
Sector	Chemicals,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types	Economic


Description	Japan Partnering Award. Text mining and bioinformatics platforms for metabolic pathway modelling.
Amount	£39,000 (GBP)
Funding ID	BB/P025684/1
Organisation	Biotechnology and Biological Sciences Research Council (BBSRC)
Sector	Public
Country	United Kingdom
Start	03/2017
End	03/2020


Title	APLenty: annotation tool for creating high-quality datasets using active and proactive learning
Description	APLenty, is an annotation tool for creating high-quality sequence labeling datasets using active and proactive learning. A major innovation of this tool is the integration of automatic annotation with active learning and proactive learning. This makes the task of creating labeled datasets easier, less time-consuming and requiring less human effort. APLenty is highly flexible and can be adapted to various tasks such as database curation and information extraction
Type Of Material	Improvements to research infrastructure
Year Produced	2018
Provided To Others?	Yes
Impact	Several groups are using it to create labelled data for training
URL	http://www.nactem.ac.uk/aplenty/


Title	Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods.
Description	We proposed an ensemble approach for relation extraction and classification between drugs and medication-related entities. We incorporated state-of-the-art named-entity recognition (NER) models based on bidirectional long short-term memory (BiLSTM) networks and conditional random fields (CRF) for end-to-end extraction. We additionally developed separate models for intra- and inter-sentence relation extraction and combined them using an ensemble method. The intra-sentence models rely on bidirectional long short-term memory networks and attention mechanisms and are able to capture dependencies between multiple related pairs in the same sentence. For the inter-sentence relations, we adopted a neural architecture that utilizes the Transformer network to improve performance in longer sequences.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	Yes
Impact	We proposed a relation extraction system to identify relations between drugs and medication-related entities. The proposed approach is independent of external syntactic tools. Analysis showed that by using latent Drug-Drug interactions we were able to significantly improve the performance of non-Drug-Drug pairs in EHRs. Research Output: Christopoulou, F., Tran, T.T., Sahu, S., Miwa, M. and S. Ananiadou (2020) Adverse Drug Events and Medication Relation Extraction in EHRs with Ensemble Deep Learning Methods, Journal of the American Medical Informatics Association, 27(1), 39-46.
URL	http://europepmc.org/article/MED/31390003


Title	An ensemble of neural models for adverse drug events and medication extraction
Description	We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenized the MIMIC III data set by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based conditional random field model and created an ensemble to combine its predictions with those of the neural model. Our method achieved 92.78% lenient micro F1-score, with 95.99% lenient precision, and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	Yes
Impact	Further funding, research output Ju, M., Nguyen, N.T.H, Miwa, M., and S. Ananiadou (2020) An Ensemble of Neural Models for Nested Adverse Drug Events and Medication Extraction with Subwords, Journal of the American Medical Informatics Association, 27(1), 22-30
URL	https://europepmc.org/article/med/31197355#free-full-text


Title	Metabolic pathways Jamboree Web App
Description	A web app for multiple users to collaborate on a metabolic map Google Docs for metabolic models A microservice architecture with a Neo4j backend Discovery and checking workflows allow the biochemist user to focus upon curation A list driven interface modelled upon SBML
Type Of Material	Improvements to research infrastructure
Year Produced	2017
Provided To Others?	Yes
Impact	Used by the Centre for Synthetic Biology of Fine and Speciality Chemicals
URL	https://github.com/dbkgroup/reaction-balancer


Title	Biochem4j graph database
Description	Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://db.synbiochem.co.uk, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists,systems engineers and the wider community of molecular biologists and biological chemists.
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
Impact	This resource is needed to integrate repositories to catalogue both biological entities and - crucially - the relationships between them. Our resource is extensible, such that newly discovered relationships - for example, those between novel, synthetic enzymes and non-natural products - can be added over time. With the introduction of our graph database, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably.
URL	http://biochem4j.org/browser/


Title	Corpus annotations for pharmacovigilance
Description	The PHAEDRA corpus is a semantically annotated corpus for pharmacovigilence (PV), consisting of 597 MEDLINE abstracts. Its fine-grained, multiple levels of annotation, added by domain-experts, make it a unique resource within the field, and aim to encourage the development/adaption of novel machine learning tools for extracting PV-related information from text. It is intended that such tools will lead to novel means of supporting curators to efficiently increase the coverage, consistency and completeness of the information in PV resources.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	A unique resource within the field of pharmacovigilance which encourages the development/adaption of novel machine learning tools for extracting PV-related information from text. It is currently used to extract information for the development of the Enzyme database in Japan (AIRC)
URL	http://www.nactem.ac.uk/PHAEDRA/


Description	Artificial Intelligence Research Centre Japan, Drug Discovery and Enzyme Pathways
Organisation	National Institute of Advanced Industrial Science and Technology
Department	Artificial Intelligence Research Centre
Country	Japan
Sector	Public
PI Contribution	This is a collaborative award between the Artificial Intelligence Research Centre in Japan and the National Centre for Text Mining. We use our text mining platform Argo as a federated machine reading system in order to generate annotations over full papers to generate a knowledge base. We represent meanings of mentions based on: (1) the contexts in which they appear in Europe PMC articles, and (2) domain knowledge from external domain specific resources, e.g., ChEBI, KEGG, MetaCyc, GO, UniProt, GenBank, OMIM, etc. Based on these two types of information, joint learning generates a deep neural network model to represent word meaning, which will in turn enable us to connect semantically similar mentions and thus construct the BKG.
Collaborator Contribution	Computational infrastructure to annotate full papers using Deep Learning.
Impact	Research Outputs 1. Tran, T.T., Miwa, M., and S. Ananiadou (2020) Syntactically-informed Word Representations from Graph Neural Network, Neurocomputing, doi.org/10.1016/j.neucom.2020.06.070 2. Trieu, H-L., Tran, T.T., Duong, K.N.A., Miwa, M. and Ananiadou, S. (2020) DeepEventMine: End-to-end Neural Nested Event Extraction from Biomedical Texts, Bioinformatics 3. Christopoulou, F., Tran, T.T., Sahu, S., Miwa, M. and S. Ananiadou (2020) Adverse Drug Events and Medication Relation Extraction in EHRs with Ensemble Deep Learning Methods, Journal of the American Medical Informatics Association, 27(1), 39-46. 4. Ju, M., Nguyen, N.T.H, Miwa, M., and S. Ananiadou (2020) An Ensemble of Neural Models for Nested Adverse Drug Events and Medication Extraction with Subwords, Journal of the American Medical Informatics Association, 27(1), 22-30 5. Li, M., Takamura, H. and S. Ananiadou (2020) A Neural Model for Aggregating Coreference Annotation in Crowdsourcing, 28th International Conference on Computational Linguistics, Coling 2020. outstanding paper
Start Year	2017


Title	APLenty is an annotation tool developed at NaCTeM for creating high-quality sequence labeling datasets using active and proactive learning.
Description	A major innovation of our tool is the integration of automatic annotation with active learning and proactive learning. This makes the task of creating labeled datasets easier, less time-consuming and requiring less human effort. APLenty is highly flexible and can be adapted to various other tasks.
Type Of Technology	Webtool/Application
Year Produced	2018
Open Source License?	Yes
Impact	Cutting annotation costs for developing gold standards; supporting crowdsourcing efforts by selecting the most reliable annotator
URL	http://www.nactem.ac.uk/aplenty/about


Title	EMPATHY Web app
Description	The code implements a web app for realtime and collaborative (like Google Docs) metabolic network reconstruction. Building a reconstruction involves discovering, curating and integrating missing molecules and reactions. This requires informatics, computational systems biology and data management that can prove difficult for biochemists expert in metabolism. The app automates discovery and integration so biochemists can focus upon curation. The app is, in effect, a virtual jamboree platform. The repository contains nearly 14000 lines of code in three languages.
Type Of Technology	Webtool/Application
Year Produced	2016
Impact	The app has been deployed on the Cloud using Google Compute Engine and is available at https://metabolicjamboree.co.uk/
URL	https://github.com/porld/empathyApp


Title	Thalia: a faceted semantic search system
Description	The main purpose of Thalia is to enable semantic search in the context of biomedical literature by leveraging previous named entity (NE) annotation efforts. The key strategy to achieve a semantic behaviour is to normalise NEs, i.e., linking entities to concepts in an openly available ontology, which effectively allows to map a concept with its multiple word forms. Thalia covers the entire PubMed, which at the point of this challenge contains about 27 million references. Thalia includes annotations of several types (Chemicals, Diseases, Drugs, Genes, Metabolites, Proteins, Species and Anatomic entities).
Type Of Technology	Webtool/Application
Year Produced	2017
Open Source License?	Yes
Impact	The semantic search system has been used to support a precision medicine challenge, retrieving documents containing potential treatments and clinical trials for specific patient characteristics.
URL	http://nactem-copious.man.ac.uk/Thalia/thalia.html


Description	10th International Biocuration Conference, Stanford University
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Co-organiser of the workshop focusing on on recent advances in the development of integrated systems to capture mechanism for biological systems, including machine reading of journal articles, (semi-)automated assembly of signaling pathway models, and machine-aided analysis of these models for tasks such as drug repurposing and explaining drugs' effects. This workshop consisted of invited speakers and contributed talks and/or panel discussions from experts in biocuration, machine reading, and biological modeling.
Year(s) Of Engagement Activity	2017
URL	https://f1000research.com/slides/6-482


Description	Artificial Intelligence Research Centre
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	The Empathy text mining platform Argo was presented in the audience as a model for curating pathways and databases.
Year(s) Of Engagement Activity	2016


Description	CBMNet-funded industry problem solving meeting (July 2016)
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	An UK Industrial Biotechnology company invited ~16 academics from across Europe to focus upon real manufacturing problems they believed to be connected to membrane transport. In the course of the meeting the EMPATHY contributor gave an informal introduction to metabolic modelling and engineering, then sketched out the EMPATHY project to gather commercial feedback. This led to a new model for commercialising the web app software.
Year(s) Of Engagement Activity	2016


Description	Conference Facilitating and promoting web annotation with Argo
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The purpose of the communication was to disseminate the development of our interoperable text mining platform Argo applied to metabolic pathways. In particular, we focused on annotation protocols for an Open Annotation Store. The iAnnotate conference focuses on aspects of sharing and re-using annotations for different text mining applications and domains.
Year(s) Of Engagement Activity	2016
URL	http://iannotate.org/2016/


Description	Invited departmental seminar, Faculty of Engineering, University of Chester (December 2016)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	The EMPATHY web app was described within the context of a broader research talk about biological modelling.
Year(s) Of Engagement Activity	2016


Description	Invited speaker
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	I was an invited speaker at PSB 2018 on precision medicine and text mining session.
Year(s) Of Engagement Activity	2018
URL	https://psb.stanford.edu/callfor/papers/cfp-textmining/


Description	Invited talk at Aberystwyth University
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	50 PGR students attended the talk which sparked questions about the role of AI in pathways
Year(s) Of Engagement Activity	2019
URL	https://www.eventbrite.co.uk/e/enriching-pathway-models-using-text-mining-mid-wales-branch-registrat...


Description	Keynote International Symposium on Information Management and Big Data (SIMBig 2019),Lima Peru 21st - 23rd August 2019. Prof. Ananiadou's talk wasentitled Text Mining for Biomedical Applications.
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Dissemination of text mining research supporting systematic reviews to a wider audience
Year(s) Of Engagement Activity	2019
URL	https://simbig.org/SIMBig2019/index.html


Description	Keynote Open Science Paris (December 2018)
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	I was invited to give a keynote talk at the Journées pour la science ouverte (Days for open science), in Paris, France. These days, organised by the Comité pour la science ouverte, have been arranged following the announcement of the French national plan for open science on July 4, 2018 by the Minister of Higher Education, Research and Innovation, and were an opportunity to mobilise the scientific community around open science and applications
Year(s) Of Engagement Activity	2018
URL	https://bib.cnrs.fr/journees-pour-la-science-ouverte-open-science-days-paris-december-4th-to-6th-201...


Description	Keynote Speaker University of York
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	This event celebrated the launch of the University's new high performance compute (HPC) cluster, 'Viking', which promises to empower researchers at York in achieving new heights of research excellence. My talk discussed how text mining needs HPC clusters.
Year(s) Of Engagement Activity	2019
URL	https://www.york.ac.uk/it-services/research-computing/vikingclusterlaunchevent/


Description	Keynote speaker
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Northern Lights Deep Learning Conference on 11th January 2022
Year(s) Of Engagement Activity	2022
URL	https://www.nldl.org/


Description	Keynote speaker
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Industry/Business
Results and Impact	Evotec's mission https://www.evotec.com/en is to discover and develop highly effective therapeutics and make them globally available to the patients who need them. My talk was related to these aims.
Year(s) Of Engagement Activity	2022


Description	Keynote speaker
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	6th International Conference on Computer and Information Science and Technology (CIST'21) July 29 - 31, 2021 The goal of this Computer and information science conference 2021 is to gather scholars from all over the world to present advances in the relevant fields and to foster an environment conducive to exchanging ideas and information. The conference will also provide an ideal environment to develop new collaborations and meet experts on the fundamentals, applications, and products of the mentioned fields.
Year(s) Of Engagement Activity	2021
URL	https://cistseries.com/


Description	Keynote speaker
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	My talk was entitled Information Extraction for Pathway Reconstruction to the Science Foundation Ireland Centre for Research Training in Machine Learning (ML-Labs)
Year(s) Of Engagement Activity	2022
URL	https://www.ml-labs.ie/


Description	Keynote speaker CLEF eHealth
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The keynote was concerned how text mining can link cancer pathway models with textual evidence to automate science for drug discovery in cancer research. Text mining techniques are being employed to construct, update and verify information in relevant models, to ensure that the information used for hypothesis generation is as accurate as possible. Complex information from the literature (semantic events) are automatically extracted and mapped/compared to reactions in existing pathway models. These comparisons allow the existing models to be verified or updated in several ways. Information from the literature can act as corroborative evidence of the validity of these reactions in a model or help to extend it. In addition, by taking into account textual context (uncertainty, negation), we can provide a confidence measure for linking and ranking evidence from the literature for model curation and experimental design.
Year(s) Of Engagement Activity	2018
URL	https://sites.google.com/view/clef-ehealth-2018/home


Description	Keynote talk at ISCB/ECCB session on Text Mining for Biology and Healthcare
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	ISMB/ECCB 2019 is the largest and most high profile annual meeting of scientists working in computational biology and provides an intense multidisciplinary forum for disseminating the latest developments in computational tools for data driven biological research.
Year(s) Of Engagement Activity	2019
URL	https://www.iscb.org/ismbeccb2019-program/special-sessions#sst01


Description	Machine reading for cancer biology at the Global Pharma R&D Informatics Congress
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This congress looks at new methods and new technologies that get the best out of the information available and strategies to integrate internal and external systems so that all teams get the information they need to accelerate the drug development pipeline. Attracting experts working in all areas of pharmaceutical R&D IT and discovery informatics, the event focused on innovations and strategies in these 4 key topic areas: • Complex Data Analytics • System Integration • AI and Machine Learning • Data Storage and Management
Year(s) Of Engagement Activity	2017
URL	http://www.global-engage.com/wp-content/uploads/2017/07/Global-Pharma-RD-Informatics-Congress-Europe...


Description	Research Software Engineering conference flash presentation
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	A flash presentation was given in a group session at the UK's first ever Research Software Engineering conference. This focused upon software aspects of the EMPATHY project. Input from other group members led to useful design changes and ideas for software maintenance beyond the funding period.
Year(s) Of Engagement Activity	2016


Description	Speaker Elsevier forum
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Engagement with Elsevier about text mining for enriching their content and discussions on collaboration
Year(s) Of Engagement Activity	2019


Description	Speaker Google London
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Collaboration with Google about Biomedical text Mining. Ongoing discussions
Year(s) Of Engagement Activity	2019


Description	Systems Biology Institute Japan
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The text mining platform Argo as a means for metabolic pathway curation was presented to the team of SBI in Tokyo with participation of policy makers from AIST and AIRC (director and deputy director). Discussions to combine the SBI platform with Empathy's Argo.
Year(s) Of Engagement Activity	2016


Description	TREC Precision Medicine / Clinical Decision track
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	We presented our work to support precision medicine based on the use of a biomedical semantic search engine called Thalia (Text mining for Highlighting, Aggregating and Linking Information in Articles), which has been developed at NaCTeM. The main purpose of Thalia is to enable semantic search in the context of biomedical literature by leveraging previous named entity (NE) annotation efforts, and to apply it to different use cases.
Year(s) Of Engagement Activity	2017
URL	http://www.trec-cds.org/2017.html


Description	Text Mining Workshop for Precision Medicine
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	The text mining workshop was aimed at clinical and industry colleagues who would like to attend a showcase of a number of innovative text mining approaches for biomarker discovery. There was also potential to discuss collaborative grant applications where this enabling technology will be an asset.
Year(s) Of Engagement Activity	2019
URL	http://www.nactem.ac.uk/newsitem.php?item=388


Description	Workshop on Biomedical Information Management, Hamburg Germany
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The workshop gathered interested professionals from across Europe working in the information or medical domain, such as medical researchers, medical doctors and entrepreneurs building their business around biomedical ICT. The goal of the workshop was to obtain answers to the following questions: What information and knowledge management solutions are actively used in the community? What are limitations of the current solutions? What important problems in biomedical information management are not addressed and automatized at all by any solution?
Year(s) Of Engagement Activity	2018
URL	http://www.bimdanube.eu/ws1/