Limits of Inference from Biological Sequence Analysis

Lead Research Organisation: University of Southampton
Department Name: Sch of Electronics and Computer Sci

Abstract

Two major developments are of interest: Biology, generating vast amounts of data at various levels of measurements is concerned with how useful information, both to understand biological function and to translate any such understanding to the treatment of complex diseases, is hidden in such data. Machine learning -- a rich combination of mathematical and computational sciences -- provides tools with which we can extract useful information from large and complex datasets. Much information about biology, inherited across generations, is held in macromolecular sequences: short motifs specifying where regulatory molecules may bind and interact, highly variable receptor sequences in immune cells that can distinguish between signals of the self and invading pathogens and loci in population level sequences that can give us cues about variants responsible for inherited diseases. In this project we will study inference algorithms that are based in deep learning for extracting useful information from biological sequence data. A particular problem we will study is learning representations - the art of mapping sequence data onto more convenient mathematical spaces, continuous and distributed, in which their manipulation by pattern recognition methods becomes convenient. We will focus on interpretability of such models to extract specific information about protein interactions, immune response and alternative splicing.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513325/1 01/10/2018 30/09/2023
2480946 Studentship EP/R513325/1 01/10/2020 31/03/2024 Ioan Ieremie
EP/T517859/1 01/10/2020 30/09/2025
2480946 Studentship EP/T517859/1 01/10/2020 31/03/2024 Ioan Ieremie