CoED: Deep reinforcement learning for improving research productivity in the life science sector.

Lead Research Organisation: University of York
Department Name: Biology

Abstract

There have recently been significant leaps in deep reinforcement learning algorithms, with notable successes in games such as Atari arcade games and Go; however, there is still a need to adapt these techniques to be more widely applicable in other domains, such as the life science sector. Identifying regulatory relationships between genes is one of the primary research activities carried out by molecular biologists and geneticists, since learning the structure of gene regulatory networks is critical for many applications, for example understanding the origins of many diseases and how crops respond to their environments. Biologists sequentially conduct experiments that provide information about the gene network structure, but they must operate under strict cost and time limits. This project aims to formulate this experiment design procedure in a reinforcement-learning framework, to ascertain how biologists should prioritise experiments to maximise information about the gene networks, under constraints. The primary deliverable will be a Computer-aided Experimental Design (CoED) software tool to aid researchers in utilising their resources most effectively. This reinforcement-learning framework could also be used to identify the bottlenecks for biomedical research, such as the pricing model or the time-intensity of certain experiments, thereby identifying the most impactful areas for further development in experimental methodology. We will deliver impact by providing consultation services to laboratory supply and service providers, and through our collaboration with our industrial partner Google Brain Genomics. This project primarily aligns with the new approaches to data science and high productivity services through specialised artificial intelligence priority areas of this call.

Planned Impact

There are two main target beneficiaries of this research proposal: (i) biology researchers in academia and industry and (ii) laboratory supply and service providers.

In academia and industry, there are a great many biologists who are inferring the structures of gene regulatory networks, and the CoED software will help them design these types of experiments more effectively. Current approaches to network inference assume that biologists either conduct all their experiments at once or that they conduct their experiments sequentially, while in fact biologists usually run their experiments with partial concurrency. My team will develop software that will better reflect the realities of the biology laboratory, and so will be of greater utility to biology researchers. Biologists will use the software to design experiments that will be most likely to learn the structure of a gene network, without going over deadlines or budget constraints. Gene networks are extremely important for reaching the EPSRC Prosperity outcomes of creating a 'Healthy Nation' and 'Resilient Nation'. For instance, gene networks help us understand the onset and disease progression of cancer (Healthy Nation) and how crops will respond to climate change (Resilient Nation).

Secondly, this project will help laboratory supply and service providers. According to 'The Scientist', there are approximately 1000 UK-based companies that provide key support services to academic and industrial biologists. However, since these companies are often small-to-medium sized, they often do not have the resources to have an internal research team to evaluate what their consumers (academic and industrial biologists) need to boost their research productivity. My team will analyse the schedules generated by CoED to identify what experimental protocols are the primary bottlenecks in experimental biology. From this analysis, my team will develop a business strategy to help these companies best deliver products and services that would have the greatest impact on research productivity.

Finally, there will be a significant outreach component of the project that will specifically target undergraduate women who major in computer science or related fields. In order to encourage this group of women to apply for graduate school in STEM subjects, I will apply to present my research and participate in a career roundtable at the Grace Hopper Celebration of Women in Computing. This EPSRC project will give me experience in working on a multidisciplinary project and developing industrial collaborations, knowledge that I could impart to other women as part of this outreach activity.

Publications

10 25 50

publication icon
Paige B (2021) Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores. in Journal of computational biology : a journal of computational molecular cell biology

publication icon
TomaĊĦev N (2020) AI for social good: unlocking the opportunity for positive impact. in Nature communications

Related Projects

Project Reference Relationship Related To Start End Award Value
EP/S001360/1 29/06/2018 29/09/2019 £329,899
EP/S001360/2 Transfer EP/S001360/1 30/09/2019 29/06/2021 £229,690