Japan Partnering Award. Text mining and bioinformatics platforms for metabolic pathway modelling.
Lead Research Organisation:
University of Manchester
Department Name: Computer Science
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Publications
Zerva C
(2017)
Using uncertainty to link and rank evidence from biomedical literature for model curation.
in Bioinformatics (Oxford, England)
Description | We developed new AI methods to extract information from text. These methods extract automatically concepts, relations, and complex relations (events). Our methods are neural based and can be used in several other domains without having to re-do the work. We obtained further funding from Japan (Japan Cancer Research and NEDO) to continue our work |
Exploitation Route | The integration of text mining platforms and workflows are crucial for the curation of databases is used by other teams (e.g. RIKEN, AIST, etc). The development of a semantic search system from the whole of PubMed supports discovery and the development of an annotation environment model development |
Sectors | Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology |
URL | http://www.nactem.ac.uk/airc/ |
Description | Our findings have been used to develop new information extraction models for the population and curation of databases. Deep Event Extraction, named entity recognition and Relation extraction using neural nets; improvement of annotation environment to support the development of training data |
First Year Of Impact | 2020 |
Sector | Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Title | APLenty: annotation tool for creating high-quality datasets using active and proactive learning |
Description | APLenty, is an annotation tool for creating high-quality sequence labeling datasets using active and proactive learning. A major innovation of this tool is the integration of automatic annotation with active learning and proactive learning. This makes the task of creating labeled datasets easier, less time-consuming and requiring less human effort. APLenty is highly flexible and can be adapted to various tasks such as database curation and information extraction |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | Several groups are using it to create labelled data for training |
URL | http://www.nactem.ac.uk/aplenty/ |
Title | Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods. |
Description | We proposed an ensemble approach for relation extraction and classification between drugs and medication-related entities. We incorporated state-of-the-art named-entity recognition (NER) models based on bidirectional long short-term memory (BiLSTM) networks and conditional random fields (CRF) for end-to-end extraction. We additionally developed separate models for intra- and inter-sentence relation extraction and combined them using an ensemble method. The intra-sentence models rely on bidirectional long short-term memory networks and attention mechanisms and are able to capture dependencies between multiple related pairs in the same sentence. For the inter-sentence relations, we adopted a neural architecture that utilizes the Transformer network to improve performance in longer sequences. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | We proposed a relation extraction system to identify relations between drugs and medication-related entities. The proposed approach is independent of external syntactic tools. Analysis showed that by using latent Drug-Drug interactions we were able to significantly improve the performance of non-Drug-Drug pairs in EHRs. Research Output: Christopoulou, F., Tran, T.T., Sahu, S., Miwa, M. and S. Ananiadou (2020) Adverse Drug Events and Medication Relation Extraction in EHRs with Ensemble Deep Learning Methods, Journal of the American Medical Informatics Association, 27(1), 39-46. |
URL | http://europepmc.org/article/MED/31390003 |
Title | An ensemble of neural models for adverse drug events and medication extraction |
Description | We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenized the MIMIC III data set by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based conditional random field model and created an ensemble to combine its predictions with those of the neural model. Our method achieved 92.78% lenient micro F1-score, with 95.99% lenient precision, and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Further funding, research output Ju, M., Nguyen, N.T.H, Miwa, M., and S. Ananiadou (2020) An Ensemble of Neural Models for Nested Adverse Drug Events and Medication Extraction with Subwords, Journal of the American Medical Informatics Association, 27(1), 22-30 |
URL | https://europepmc.org/article/med/31197355#free-full-text |
Title | Event extraction for biological interactions |
Description | We propose an end-to-end neural nested event extraction model named DeepEventMine that extracts multiple overlapping directed acyclic graph structures from a raw sentence. On the top of the bidirectional encoder representations from transformers model, our model detects nested entities and triggers, roles, nested events and their modifications in an end-to-end manner without any syntactic tools. Our DeepEventMine model achieves the new state-of-the-art performance on seven biomedical nested event extraction tasks. Even when gold entities are unavailable, our model can detect events from raw text with promising performance. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Further funding based on this research; research publication Trieu, H-L., Tran, T.T., Duong, K.N.A., Miwa, M. and Ananiadou, S. (2020) DeepEventMine: End-to-end Neural Nested Event Extraction from Biomedical Texts, Bioinformatics |
URL | https://github.com/aistairc/DeepEventMine |
Title | Corpus annotations for pharmacovigilance |
Description | The PHAEDRA corpus is a semantically annotated corpus for pharmacovigilence (PV), consisting of 597 MEDLINE abstracts. Its fine-grained, multiple levels of annotation, added by domain-experts, make it a unique resource within the field, and aim to encourage the development/adaption of novel machine learning tools for extracting PV-related information from text. It is intended that such tools will lead to novel means of supporting curators to efficiently increase the coverage, consistency and completeness of the information in PV resources. |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | A unique resource within the field of pharmacovigilance which encourages the development/adaption of novel machine learning tools for extracting PV-related information from text. It is currently used to extract information for the development of the Enzyme database in Japan (AIRC) |
URL | http://www.nactem.ac.uk/PHAEDRA/ |
Description | Artificial Intelligence Research Centre Japan, Drug Discovery and Enzyme Pathways |
Organisation | National Institute of Advanced Industrial Science and Technology |
Department | Artificial Intelligence Research Centre |
Country | Japan |
Sector | Public |
PI Contribution | This is a collaborative award between the Artificial Intelligence Research Centre in Japan and the National Centre for Text Mining. We use our text mining platform Argo as a federated machine reading system in order to generate annotations over full papers to generate a knowledge base. We represent meanings of mentions based on: (1) the contexts in which they appear in Europe PMC articles, and (2) domain knowledge from external domain specific resources, e.g., ChEBI, KEGG, MetaCyc, GO, UniProt, GenBank, OMIM, etc. Based on these two types of information, joint learning generates a deep neural network model to represent word meaning, which will in turn enable us to connect semantically similar mentions and thus construct the BKG. |
Collaborator Contribution | Computational infrastructure to annotate full papers using Deep Learning. |
Impact | Research Outputs 1. Tran, T.T., Miwa, M., and S. Ananiadou (2020) Syntactically-informed Word Representations from Graph Neural Network, Neurocomputing, doi.org/10.1016/j.neucom.2020.06.070 2. Trieu, H-L., Tran, T.T., Duong, K.N.A., Miwa, M. and Ananiadou, S. (2020) DeepEventMine: End-to-end Neural Nested Event Extraction from Biomedical Texts, Bioinformatics 3. Christopoulou, F., Tran, T.T., Sahu, S., Miwa, M. and S. Ananiadou (2020) Adverse Drug Events and Medication Relation Extraction in EHRs with Ensemble Deep Learning Methods, Journal of the American Medical Informatics Association, 27(1), 39-46. 4. Ju, M., Nguyen, N.T.H, Miwa, M., and S. Ananiadou (2020) An Ensemble of Neural Models for Nested Adverse Drug Events and Medication Extraction with Subwords, Journal of the American Medical Informatics Association, 27(1), 22-30 5. Li, M., Takamura, H. and S. Ananiadou (2020) A Neural Model for Aggregating Coreference Annotation in Crowdsourcing, 28th International Conference on Computational Linguistics, Coling 2020. *outstanding paper* |
Start Year | 2017 |
Description | Collaboration with AIRC |
Organisation | National Institute of Advanced Industrial Science and Technology |
Department | Artificial Intelligence Research Centre |
Country | Japan |
Sector | Public |
PI Contribution | Research outcomes contributing to innovation in AI and NLP |
Collaborator Contribution | Joint research papers; software; resources; internships |
Impact | Hai-Long Trieu, Makoto Miwa, Sophia Ananiadou, BioVAE: a pre-trained latent variable language model for biomedical text mining, Bioinformatics, Volume 38, Issue 3, 1 February 2022, Pages 872-874, https://doi.org/10.1093/bioinformatics/btab702 Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou. 2021. Distantly Supervised Relation Extraction with Sentence Reconstruction and Knowledge Base Priors. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 11-26, Online. Association for Computational Linguistics. |
Start Year | 2017 |
Title | APLenty is an annotation tool developed at NaCTeM for creating high-quality sequence labeling datasets using active and proactive learning. |
Description | A major innovation of our tool is the integration of automatic annotation with active learning and proactive learning. This makes the task of creating labeled datasets easier, less time-consuming and requiring less human effort. APLenty is highly flexible and can be adapted to various other tasks. |
Type Of Technology | Webtool/Application |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | Cutting annotation costs for developing gold standards; supporting crowdsourcing efforts by selecting the most reliable annotator |
URL | http://www.nactem.ac.uk/aplenty/about |
Title | Thalia: a faceted semantic search system |
Description | The main purpose of Thalia is to enable semantic search in the context of biomedical literature by leveraging previous named entity (NE) annotation efforts. The key strategy to achieve a semantic behaviour is to normalise NEs, i.e., linking entities to concepts in an openly available ontology, which effectively allows to map a concept with its multiple word forms. Thalia covers the entire PubMed, which at the point of this challenge contains about 27 million references. Thalia includes annotations of several types (Chemicals, Diseases, Drugs, Genes, Metabolites, Proteins, Species and Anatomic entities). |
Type Of Technology | Webtool/Application |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | The semantic search system has been used to support a precision medicine challenge, retrieving documents containing potential treatments and clinical trials for specific patient characteristics. |
URL | http://nactem-copious.man.ac.uk/Thalia/thalia.html |
Description | 10th International Biocuration Conference, Stanford University |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Co-organiser of the workshop focusing on on recent advances in the development of integrated systems to capture mechanism for biological systems, including machine reading of journal articles, (semi-)automated assembly of signaling pathway models, and machine-aided analysis of these models for tasks such as drug repurposing and explaining drugs' effects. This workshop consisted of invited speakers and contributed talks and/or panel discussions from experts in biocuration, machine reading, and biological modeling. |
Year(s) Of Engagement Activity | 2017 |
URL | https://f1000research.com/slides/6-482 |
Description | Invited Talk Artificial Intelligence Research Centre, AIST Japan |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation of the text mining infrastructure to support the automatic reconstruction of pathways and their curation. |
Year(s) Of Engagement Activity | 2017 |
Description | Invited talk at Aberystwyth University |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Postgraduate students |
Results and Impact | 50 PGR students attended the talk which sparked questions about the role of AI in pathways |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.eventbrite.co.uk/e/enriching-pathway-models-using-text-mining-mid-wales-branch-registrat... |
Description | Keynote International Symposium on Information Management and Big Data (SIMBig 2019),Lima Peru 21st - 23rd August 2019. Prof. Ananiadou's talk wasentitled Text Mining for Biomedical Applications. |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Dissemination of text mining research supporting systematic reviews to a wider audience |
Year(s) Of Engagement Activity | 2019 |
URL | https://simbig.org/SIMBig2019/index.html |
Description | Keynote Speaker University of York |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Postgraduate students |
Results and Impact | This event celebrated the launch of the University's new high performance compute (HPC) cluster, 'Viking', which promises to empower researchers at York in achieving new heights of research excellence. My talk discussed how text mining needs HPC clusters. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.york.ac.uk/it-services/research-computing/vikingclusterlaunchevent/ |
Description | Keynote speaker |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | My talk was entitled Information Extraction for Pathway Reconstruction to the Science Foundation Ireland Centre for Research Training in Machine Learning (ML-Labs) |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.ml-labs.ie/ |
Description | Keynote speaker |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | 6th International Conference on Computer and Information Science and Technology (CIST'21) July 29 - 31, 2021 The goal of this Computer and information science conference 2021 is to gather scholars from all over the world to present advances in the relevant fields and to foster an environment conducive to exchanging ideas and information. The conference will also provide an ideal environment to develop new collaborations and meet experts on the fundamentals, applications, and products of the mentioned fields. |
Year(s) Of Engagement Activity | 2021 |
URL | https://cistseries.com/ |
Description | Keynote speaker |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Industry/Business |
Results and Impact | Evotec's mission https://www.evotec.com/en is to discover and develop highly effective therapeutics and make them globally available to the patients who need them. My talk was related to these aims. |
Year(s) Of Engagement Activity | 2022 |
Description | Keynote speaker |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Keynote speech about the use of AI in biomedicine; increased interest in the field. |
Year(s) Of Engagement Activity | 2021 |
URL | http://www.binfo.ncku.edu.tw/APBC2021/keynote.html |
Description | Keynote speaker |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Northern Lights Deep Learning Conference on 11th January 2022 |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.nldl.org/ |
Description | Keynote speaker CLEF eHealth |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | The keynote was concerned how text mining can link cancer pathway models with textual evidence to automate science for drug discovery in cancer research. Text mining techniques are being employed to construct, update and verify information in relevant models, to ensure that the information used for hypothesis generation is as accurate as possible. Complex information from the literature (semantic events) are automatically extracted and mapped/compared to reactions in existing pathway models. These comparisons allow the existing models to be verified or updated in several ways. Information from the literature can act as corroborative evidence of the validity of these reactions in a model or help to extend it. In addition, by taking into account textual context (uncertainty, negation), we can provide a confidence measure for linking and ranking evidence from the literature for model curation and experimental design. |
Year(s) Of Engagement Activity | 2018 |
URL | https://sites.google.com/view/clef-ehealth-2018/home |
Description | Keynote talk at ISCB/ECCB session on Text Mining for Biology and Healthcare |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | ISMB/ECCB 2019 is the largest and most high profile annual meeting of scientists working in computational biology and provides an intense multidisciplinary forum for disseminating the latest developments in computational tools for data driven biological research. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.iscb.org/ismbeccb2019-program/special-sessions#sst01 |
Description | Machine reading for cancer biology at the Global Pharma R&D Informatics Congress |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This congress looks at new methods and new technologies that get the best out of the information available and strategies to integrate internal and external systems so that all teams get the information they need to accelerate the drug development pipeline. Attracting experts working in all areas of pharmaceutical R&D IT and discovery informatics, the event focused on innovations and strategies in these 4 key topic areas: • Complex Data Analytics • System Integration • AI and Machine Learning • Data Storage and Management |
Year(s) Of Engagement Activity | 2017 |
URL | http://www.global-engage.com/wp-content/uploads/2017/07/Global-Pharma-RD-Informatics-Congress-Europe... |
Description | Speaker Elsevier forum |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Engagement with Elsevier about text mining for enriching their content and discussions on collaboration |
Year(s) Of Engagement Activity | 2019 |
Description | Speaker Google London |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Collaboration with Google about Biomedical text Mining. Ongoing discussions |
Year(s) Of Engagement Activity | 2019 |
Description | Workshop on Biomedical Information Management, Hamburg Germany |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | The workshop gathered interested professionals from across Europe working in the information or medical domain, such as medical researchers, medical doctors and entrepreneurs building their business around biomedical ICT. The goal of the workshop was to obtain answers to the following questions: What information and knowledge management solutions are actively used in the community? What are limitations of the current solutions? What important problems in biomedical information management are not addressed and automatized at all by any solution? |
Year(s) Of Engagement Activity | 2018 |
URL | http://www.bimdanube.eu/ws1/ |