Benchmarking learning and reasoning with biomedical knowledge graphs

Lead Research Organisation: University of Oxford
Department Name: Sustain Approach to Biomedical Sci CDT

Abstract

Context: Recently, a handful of reinforcement learning models have been developed in order to infer missing biological relations as well as explain them on biomedical knowledge graphs (such as Hetionet). For the context of modelling drug treatment or drug repositioning, the model will usually perform walks along the biomedical knowledge graph and be rewarded for paths that begin at a "Compound" node and end at a "Disease" node. Most recently, PoLo (policy-guided walks based on reinforcement learning with logical rules), which injects weak supervision (high-level metapaths) is shown to achieve state of the art results in drug repurposing and drug-target identification tasks. However, these high-level metapaths are still biologically ambiguous or even flawed, and cannot produce meaningful explanations for the hypotheses generated. Aim: Therefore, our main goal is to explore alternative model constructions and graph representations in order to obtain more accurate and explainable predictions, or find a way to prioritize predictions that have been generated. Novel methodologies: When it comes to predicting drugs or targets of a particular disease, therapy area experts usually look at closely related bioprocesses and work around them to find new therapies. Therefore, one option of improving existing models would be to incorporate this valuable domain knowledge of the importance of bioprocesses into the reinforcement learning models. This can be done in a few of ways, including modelling lower-level metapaths in these biological knowledge graphs, developing an additional performance metric that measures the likelihood of bioprocesses existing in a certain path, or directly adding known bioprocess-disease edges to the graph data structure. Another approach would be to look into how the existing reinforcement learning models are generating their proposed paths and trying to optimize this step of the model. Currently, Long Short-Term Memory (LSTM) blocks are being used in order to keep track of the history of the path, and this information is then used to inform which node the "agent" should travel to next. However, this process of deciding the optimal next step could be improved by utilizing a graph neural network (GNN) instead. By using a GNN, the model will be able to access more neighborhood information of a given node and use these signals to make a more informed decision on which node to travel to next, thereby leading to better paths. Other neural network architectures could also be tested and compared to see which ones have the best performance. Alignment to EPSRC's strategies and research areas: This project falls within the EPSRC's artificial intelligence technologies, biological informatics, and software engineering research areas. Industrial collaborator(s): AstraZeneca

Planned Impact

The main impact of the SABS CDT will be the difference made by the scientists trained within it, both during their DPhils and throughout their future careers.

The impact of the students during their DPhil should be measured by the culture change that the centre engenders in graduate training, in working at the interface between mathematical/physical sciences and the biomedical sciences, and in cross sector industry/academia working practices.

Current SABS projects are already changing the mechanisms of industry academic collaboration, for example as described by one of our Industrial Partners

"UCB and Roche are currently supervising a joint DPhil project and have put in two more joint proposals, which would have not been possible without the connections and the operational freedom offered by SABS-IDC and its open innovation culture, a one-of-the-kind in UK's CDTs."

New collaborations are also being generated: over 25% of current research projects are entirely new partnerships brokered by the Centre. The renewal of SABS will allow it to continue to strengthen and broaden this effect, building new bridges and starting new collaborations, and changing the culture of academic industrial partnerships. It will also continue to ensure that all of its research is made publically available through its Open Innovation structure, and help to create other centres with similar aims.

For all of our partners however, the students themselves are considered to be the ultimate output: as one our partners describes it,

"I believe the current SABS-IDC has met our original goals of developing young research scientists in a multidisciplinary environment with direct industrial experience and application. As a result, the graduating students have training and research experience that is directly applicable to the needs of modern lifescience R&D, in areas such as pharmaceuticals and biotechnology."

However, it is not only within the industrial realm that students have impact; in the later years of their DPhils, over 40% of SABS students, facilitated by the Centre, have undertaken various forms of public engagement. This includes visiting schools, working alongside Zooniverse to develop citizen science projects, and to produce educational resources in the area of crystal images. In the new Centre all students will be required to undertake outreach activities in order to increase engagement with the public.

The impact of the students after they have finished should be measured by how they carry on this novel approach to research, be it in the sector or outside it. As our industrial letters of support make clear, though no SABS students have yet completed their DPhils, there is a clear expectation that they will play a significant role in shaping the UK economy in the future. For example, as one of our partners comments about our students

"UCB has been in constant search for such talents, who would thrive in pharmaceutical research, but they are rare to find in conventional postgraduate programmes. Personally I am interested in recruiting SABS-IDC students to my group once they are ready for the job market."

To demonstrate the type of impact that SABS alumni will have, we consider the impact being made by the alumni of the i-DTC programmes from which this proposal has grown. Examples include two start-up companies, both of which already have investment in the millions. Several students also now hold senior positions in industry and in research facilities and institutes. They have also been named on 30 granted or pending patents, 15 of these arising directly from their DPhil work.

The examples of past success given above indicate the types of impact we expect the graduates from SABS to achieve, and offer clear evidence that SABS students will become future research leaders, driving innovation and changing research culture.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024093/1 01/10/2019 31/03/2028
2597539 Studentship EP/S024093/1 01/10/2021 30/09/2025 Emily Jin