Inference of genealogical relationships among individuals from genetic markers

Lead Research Organisation: Zoological Society of London
Department Name: Institute of Zoology

Abstract

Individuals in a population may have among themselves various genealogical relationships (such as sib and parent-offspring relationships). Knowledge of these relationships is essential in many areas of research in behavioural, ecological and evolutionary genetics and in conservation biology. Although pedigree or ecological data can be used to determine relationships, such data are rarely available from most natural populations. In such cases, we can genotype individuals at a number of marker loci and infer their genealogical relationships from the pattern of similarity (allele sharing) among the multi-locus genotypes of the individuals. Powerful likelihood methods have been developed to partition a sample of individuals into distinctive genetic groups (defined by structured relationships) of variable sizes by maximising the likelihood of marker data. However, these methods are limited in application to either small problems in which the number of sampled individuals is small, or simple problems in which only sibships are inferred in a one-generation sample of individuals. Built on my previous work, this project aims to develop statistical methods for inferring parentage and sibships jointly in a large two-generation sample of individuals from marker data, and for assessing the uncertainties of the inferences. Typing errors and mutations in marker data will be accounted for in inferring relationships and be identified simultaneously by the methods. These extensions will make the group likelihood methods more flexible, robust and powerful in inferring relationships among individuals from marker data in practice. We will first develop the methodology, and then use extensive simulations to investigate the statistical properties of the method and its robustness when some assumptions are violated. Some empirical data sets with known relationships will be analysed by the proposed methods to further check their performance in realistic situations and to demonstrate their usefulness. The final goal is to develop a software package implementing the proposed group likelihood methods and to make it available free on the World Wide Web to the scientific community.

Technical Summary

Using genetic markers to infer the genealogical relationships (e.g. sibship) among individuals in a population is becoming an important tool in ecology and evolutionary and conservation biology. Powerful group likelihood methods have been developed to partition a sample of individuals into distinctive genetic groups (defined by one or more types of relationships organised in a specific structure) of variable sizes by maximising the likelihood of marker data. However, these methods are limited in application to either small problems in which the number of sampled individuals is small, or simple problems in which only sibships are inferred in a one-generation sample of individuals. Built on my previous work, this project aims to develop statistical methods for inferring parentage and sibships jointly in a large two-generation sample of individuals from marker data, and for assessing the uncertainties of the inferences. Typing errors and mutations in marker data will be accounted for in inferring relationships and be identified simultaneously by the methods. These extensions will make the group likelihood methods more flexible, robust and powerful in inferring relationships among individuals from marker data in practice. We will first develop the methodology, and then use extensive simulations to investigate the statistical properties of the method and its robustness when some assumptions are violated. Some empirical data sets with known relationships will be analysed by the proposed methods to further check their performance in realistic situations and to demonstrate their usefulness. The final goal is to develop a software package implementing the proposed group likelihood methods and to make it available free on the World Wide Web to the scientific community.
 
Description Knowledge of the familial relationships among individuals in a population is essential in many areas of research in behavioural, ecological and evolutionary genetics and in conservation biology. It is valuable, for example, in studies of the social behaviour/organization, mating systems, dispersal, isolation by distance and spatial genetic structure in natural populations. Unfortunately, familial relationships among individuals in most natural (wild) populations are unknown, and have to be inferred from genetic marker data. Previous marker based methods have strong assumptions (e.g. males and females cannot be both polygamous) and thus have limited applications in practice. The present project has developed a rigorous statistical population genetic method with fewer limiting assumptions. The power and statistical behaviour of the method has been checked by numerous simulations considering various scenarios of parameter combinations, and by some empirical datasets. The method has also been implemented in computer programs for all 3 major platforms (Windows, Linux, Mac) which are freely downloadable from our website http://www.zsl.org/science/software/colony. Two postdoc working on this project received training in population genetics, computer programming, data analysis, and others.
Exploitation Route The findings were summarized in a number of scientific papers published in mainstream genetic and ecology journals. The total citations of these papers are over 1000 times now. The software was published in our institute's website for free download.
Sectors Agriculture, Food and Drink,Environment

URL http://www.zsl.org/science/software/colony
 
Description Papers summarising our findings have been read and cited by the scientific research community. The software of the method developed in the project have been used by numerous researchers in analysing their data.
First Year Of Impact 2009
Sector Agriculture, Food and Drink,Environment
Impact Types Economic,Policy & public services

 
Description estimating selfing rate from sibship analysis 
Organisation University of British Columbia
Country Canada 
Sector Academic/University 
PI Contribution Conceived the original idea, writing simulation programs, conducting simulations
Collaborator Contribution providing several empirical datasets
Impact Wang, J., ELKASSABY, Y. A., & Ritland, K. (2012). Estimating selfing rates from reconstructed pedigrees using multilocus genotype data. Molecular ecology, 21(1), 100-116.
Start Year 2010
 
Description parentage and sibship analysis in polyploid species 
Organisation Michigan State University
Country United States 
Sector Academic/University 
PI Contribution Extended our analysis method to the case of polyploids, conducted simulations to check the accuracy of the extended method, analysed an empirical dataset provided by the collaborators
Collaborator Contribution Our partners contacted us requesting to extend our method, genotyped a sample of sturgeons with known relationships.
Impact Wang J & Scribner KT. (2014) Parentage and sibship inference from markers in polyploids. Molecular Ecology Resources 14: 541-553.
Start Year 2012
 
Title COLONY 
Description The software implements the statistical method developed in the project to infer parentage and sibship among individuals using their multilocus genotype data. 
Type Of Technology Software 
Year Produced 2009 
Impact the software has been widely applied by the ecologists and other biologists to infer parentage and sibship from marker data 
URL http://www.zsl.org/science/software/colony