Polygenic Risk Prediction with Machine Learning

Lead Research Organisation: University of St Andrews

Department Name: Computer Science

Abstract

In recent years, Deep Neural Networks (DNNs) have provided a set of tools with which to tackle classification problems that appeared impenetrable to other types of machine learning. While the theoretical basis for DNNs has existed for decades, their recent success is the result of advances in hardware coupled with the refinement of network design, resulting in new families of network architectures. These 'families' consist of networks with similar structures and design features that are ideally suited for a specific task: for example, Recurrent Neural Networks provide a unique pipeline for capturing patterns in data with a temporal element, such as human speech, whereas Convolutional Neural Networks are highly performant for computer vision tasks. The design of networks within these families is still under rapid development, with the introduction of new elements, such as max-pooling layers, constantly improving their performance at more and more specialised tasks. However, there are areas of scientific research in which deep neural networks could offer opportunities for new analysis, but for which no architectural framework yet exists.
In the field of human genetics, the establishment of larger and larger datasets has called into question the use of traditional analytical methods, the limitations of which could previously be attributed to insufficient data. The prediction of disease from genetic information is a question of great importance with many valuable applications, but current methods fall short in their predictive ability, likely because they are unable to model the complex and poorly-understood interactions between genetic components. DNNs have the power to model such interactions without explicit representation of them, but their application in this area would require the development of a novel network architecture designed to handle the unique characteristics of genetic data. While some promising steps have been made in this area (see for example 'Basset', Kelley, Snoek & Rinn, 2016), a a family of DNN architectures designed to effectively classify disease characteristics from genetic information does not yet exist. Using data from the UK Biobank resource, I plan to develop several DNN architectures and evaluate their ability at predicting disease, with the aim of furthering research into this novel area of DNN application.

Student:

Chloe Hequet

Period of Study:

Jan 19 - May 22

Funder:

COVID

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2268475

Research Topic:

Unclassified

Organisations

University of St Andrews (Lead Research Organisation)

People	ORCID iD
Chloe Hequet (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/R513337/1			30/09/2018	29/09/2023
2268475	Studentship	EP/R513337/1	01/01/2019	30/05/2022	Chloe Hequet
NE/W502935/1			31/03/2021	30/03/2022
2268475	Studentship	NE/W502935/1	01/01/2019	30/05/2022	Chloe Hequet

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects