Next generation Text Mining in Drug Discovery

Lead Research Organisation: Queen Mary University of London

Department Name: Digital Environment Research Institute

Abstract

Extracting interesting and non-trivial patterns from text documents is the next-generation wave of knowledge discovery in biochemical sciences. Free text resident in biomedical literature contains a wealth of information about small molecules and their targets that is not currently stored in biochemical knowledgebases. This information can be exploited to identify and build specific signatures for drug-gene associations, chemical and biological toxicity and even adverse drug effects.

Recent advances in embedding methods have shown promising results for several biomedical and clinical tasks. Text classification performed on biomedical records poses specific challenges including dataset imbalance, miss-spellings, abbreviations or semantic ambiguity. Current state-of-the-art approaches apply deep learning to the task, mainly convolutional neural network (CNN), recurrent neural network (RNN), bi-directional long short term memory (Bi-LSTM), and BERT (Devlin et al.,2019; Wolf et al.,20).

This project will contribute towards Exscientia' existing text mining platform by optimising named entity recognition (NER) procedures and applying novel machine learning strategies to generate your own semantic lexicon. It will have access to expertise across Discovery and AI technology teams to advise/support during the project.

Student:

Yuan Liang

Period of Study:

Dec 22 - Nov 26

Funder:

BBSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2760490

Research Topic:

Unclassified

Organisations

People	ORCID iD
Massimo Poesio (Primary Supervisor)
Yuan Liang (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
BB/X511833/1			01/12/2022	30/11/2026
2760490	Studentship	BB/X511833/1	01/12/2022	30/11/2026	Yuan Liang

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects