Inference and Applications of Genetic Relatedness in Human Populations

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Any group of human individuals is tightly connected through an unobservable network of genealogical relationships, which extends deep into the past. Occasionally, pairs of individuals will share a common ancestor that lived only tens or hundreds of generations ago, co-inheriting large portions of their genomes that are "identical-by-descent" (IBD). Accurate detection of IBD regions is of great interest in a number of genomic analyses; IBD segments can identify genetic relatives, outline the presence of disease-causing mutations, and reveal fine-scale demographic properties of the analyzed groups.

Detection of IBD segments in large data sets, however, presents a number of computational challenges. Algorithms that scale to data sets comprising hundreds of thousands of samples (Gusev et al. Genome Research 2009, Naseri et al. Biorxiv 2017), resort to model-free string-matching approaches that favor computational speed over IBD detection accuracy. Accurate model-based algorithms (Browning and Browning, Genetics, 2013), on the other hand, cannot scale to large data sets. We recently developed a new accurate and efficient algorithm for IBD detection. We exploited recent developments in efficient pattern matching algorithms (e.g. Durbin, Bioinformatics 2014), in combination with a new probabilistic approach to accurately verify the presence of IBD sharing in a region (Palamara et al. Nature Genetics, 2018). IBD sharing is known to provide an effective alternative route to performing association for rare disease-causing mutations (Gusev et al. AJHG 2011), which cannot be observed in currently available data sets of common SNPs (e.g. UK Biobank).

We will thus use our novel algorithm to detect new associations between disease phenotypes and rare or low-frequency causal variation in the human genome. We will also explore and develop new deep learning based techniques which have almost never been employed in population genetics before. The methods we propose to develop will help elucidate the role of low-frequency genomic variation in the genetic architecture of common disease and provide new tools for detecting rare mutations with large phenotypic effects.

The project falls within the MRC "Living a long & healthy life - Molecular datasets & disease" research area. This project aims to use genetics to understand predisposition to disease, by creating new tools to mine evidence of disease-causing variation in large genomic data sets, including data sets resulting from MRC-funded projects (e.g. the UK Biobank data set). The development of new computational tools for analysis of large data sets in this domain is an explicit goal of the MRC: https://www.mrc.ac.uk/research/strategy/aim-1/theme-2/objective-5/

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
MR/S502509/1 01/10/2018 30/06/2022
2119007 Studentship MR/S502509/1 01/10/2018 30/04/2022 Juba Nait Saada