Polygenic Risk Prediction with Machine Learning

Lead Research Organisation: University of St Andrews
Department Name: Computer Science

Abstract

In recent years, Deep Neural Networks (DNNs) have provided a set of tools with which to tackle classification problems that appeared impenetrable to other types of machine learning. While the theoretical basis for DNNs has existed for decades, their recent success is the result of advances in hardware coupled with the refinement of network design, resulting in new families of network architectures. These 'families' consist of networks with similar structures and design features that are ideally suited for a specific task: for example, Recurrent Neural Networks provide a unique pipeline for capturing patterns in data with a temporal element, such as human speech, whereas Convolutional Neural Networks are highly performant for computer vision tasks. The design of networks within these families is still under rapid development, with the introduction of new elements, such as max-pooling layers, constantly improving their performance at more and more specialised tasks. However, there are areas of scientific research in which deep neural networks could offer opportunities for new analysis, but for which no architectural framework yet exists.
In the field of human genetics, the establishment of larger and larger datasets has called into question the use of traditional analytical methods, the limitations of which could previously be attributed to insufficient data. The prediction of disease from genetic information is a question of great importance with many valuable applications, but current methods fall short in their predictive ability, likely because they are unable to model the complex and poorly-understood interactions between genetic components. DNNs have the power to model such interactions without explicit representation of them, but their application in this area would require the development of a novel network architecture designed to handle the unique characteristics of genetic data. While some promising steps have been made in this area (see for example 'Basset', Kelley, Snoek & Rinn, 2016), a a family of DNN architectures designed to effectively classify disease characteristics from genetic information does not yet exist. Using data from the UK Biobank resource, I plan to develop several DNN architectures and evaluate their ability at predicting disease, with the aim of furthering research into this novel area of DNN application.

Publications

10 25 50