Next generation Text Mining in Drug Discovery

Lead Research Organisation: Queen Mary University of London
Department Name: Digital Environment Research Institute


Extracting interesting and non-trivial patterns from text documents is the next-generation wave of knowledge discovery in biochemical sciences. Free text resident in biomedical literature contains a wealth of information about small molecules and their targets that is not currently stored in biochemical knowledgebases. This information can be exploited to identify and build specific signatures for drug-gene associations, chemical and biological toxicity and even adverse drug effects.

Recent advances in embedding methods have shown promising results for several biomedical and clinical tasks. Text classification performed on biomedical records poses specific challenges including dataset imbalance, miss-spellings, abbreviations or semantic ambiguity. Current state-of-the-art approaches apply deep learning to the task, mainly convolutional neural network (CNN), recurrent neural network (RNN), bi-directional long short term memory (Bi-LSTM), and BERT (Devlin et al.,2019; Wolf et al.,20).

This project will contribute towards Exscientia' existing text mining platform by optimising named entity recognition (NER) procedures and applying novel machine learning strategies to generate your own semantic lexicon. It will have access to expertise across Discovery and AI technology teams to advise/support during the project.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/X511833/1 01/12/2022 30/11/2026
2760490 Studentship BB/X511833/1 01/12/2022 30/11/2026 Yuan Liang