Visual Interactive Pedigree ExploreR (VIPER)

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

This project aims to produce a tool to remove errors in animal pedigree information caused by administrative and data handling faults. Large amounts of animal pedigree and characteristic data are logged and stored during the course of animal breeding studies. However, to be of any use for further programmes or analysis the data needs to be as free of error as possible. Errors in data storage such as recording the wrong father for an animal or unnoticed change in associated gene data are easy to introduce when hundreds or thousands of individual animals are being dealt with. Unfortunately while it is relatively easy to process this data to find the existence of errors, finding and correcting the cause of the errors is more difficult. For example, it isn't straightforward to know if an error is in the pedigree i.e. the child-parent relationships or in the characteristics associated with the animals. An animal may be recorded as having a certain characteristic that on examination may not be possibly inherited from its two recorded parents. So is the recording of one or both of the parents wrong, the recording of the characteristic in the child animal incorrect, or the characteristic in one of the parent animals wrong? To answer this question further examination of the problem animals' relations in the pedigree is necessary. However, in a text or spreadsheet-based document this quickly becomes tedious and confusing even when the operations to detect and show errors in the data are available. However, if we were to switch to a more graphical, user-friendly style of displaying the data then it would be easier to follow relationships in the pedigree. If we added on top the capabilities to interactively show up where errors occurred and where they could possibly be caused from we would have a way of examining the pedigree data and asking questions that would clear up or narrow down errors. Such a way of displaying and interacting with data is called Information Visualisation (IV). Unlike human family trees, most recorded animal pedigrees have a large degree of in-breeding as scientists and breeders try to encourage certain characteristics through selective breeding. This makes the drawing of animal pedigrees more complex as two individuals may end up being related through two or more routes. By extending current IV techniques for this type of data this project will make the interface less complex by interactively showing only selected individuals and their relationships. On top of this the scientists will also wish to view some display of the characteristics associated with the animals and again the complexity can be reduced by viewing only a handful of characteristics at a time. Even so, one male animal can easily sire dozens of children who are in turn related to dozens of female parents and then in turn again may have children of their own - and there may be a several characteristics at a time a scientist is interested in exploring for these animals. Methods for seamlessly moving from showing one part of a pedigree to another will be developed to help scientists explore massive pedigrees. Once an initial interface is built then a means for exploring errors by asking 'what-if' questions will be developed. Possibilities include the ability to 'mask out' problem individuals or problem characteristics to see what effect that has on the pedigree and errors, or to actually edit information and recalculate the effect on the pedigree again. The ability to redo and undo past actions will be needed and in the end the scientist will produce a set of actions that lead to a clean data set, or as close as can be achieved. Throughout the course of the project the work will be tested with scientists who use pedigree data. In the end we will produce a tool that will benefit scientists who work with pedigrees by allowing them to readily clean their data, allowing them to share it usefully with other scientists.

Technical Summary

Pedigree genotype data produced from animal breeding experiments are the basis for the genetic mapping of markers and phenotypes that underpin selective breeding programmes. To be useful for such work the pedigree genotypes need to be free from error, but the size of the datasets means that some pedigree errors, mis-typings and sample mis-identifications are inevitable. Current tools for identifying such errors may show where these problems manifest, but sourcing the cause of the errors is more complex and in current text and table based tools becomes intractable, especially when multiple errors may be at work. To this end, we propose a new Information Visualisation (IV) tool to aid geneticists. IV is the use of graphical and interactive techniques to display and query abstract data sets such as pedigree genotypes. A first phase will develop a tool that shows the pedigree and associated marker data in an intuitive graphical representation and integrate it with existing back-end data cleaning algorithms to show where errors occur in a pedigree. A second phase will incorporate interactive techniques for dynamic feedback that allow geneticists to hypothesise as to the source of data errors within a pedigree and view the effect on the state of the erroneous data. For example, this may include reassigning parent animals, changing specific marker values or masking entire sets of markers. Ultimately a geneticist will be able to arrive at a set of actions that produce an error-free data set. Therefore, the outcome of this research project will be to produce a tool to allow a geneticist to interactively clean up pedigree genotype datasets. Beneficiaries will include the geneticists who produce the initial data and other specialists who will now be able to use the cleaned data set for their own analyses and research.

Planned Impact

The primary beneficiaries of this project will be the target user group for the pedigree visualisation tool (VIPER), i.e. all animal breeders and geneticists currently engaged in generating pedigree genotype data for genetic analyses for whatever purpose (e.g. the calculation of linkage associations and generation of multi-locus genetic linkage maps). As well as allowing visual exploration of pedigree genotype datasets VIPER will provide the means for cleansing genetically inconsistent genotype datapoints from the data. This data cleaning is essential prior to downstream processing in genetic analyses, and will thus enable the sharing of valid datasets between researchers. For example species resource databases such as ResSpecies require that any uploaded datasets are genetically consistent prior to submission. Animal breeding studies underpin a diverse range of research areas including research into animal and human health (e.g. inherited disorders, disease susceptibility, infection immunity and host-pathogen interactions) as well as the inheritance of other economically important traits in agriculture, such as yield and quality. As such VIPER will be of potential benefit for any animal breeding programme in agricultural, scientific and medical research and potential users will span the academic and commercial sectors. Several research groups working at and collaborating with The Roslin Institute will benefit immediately and directly, and the tool will be freely available to any researcher worldwide who wishes to use the tool for similar data cleaning tasks. To support this, the tool and documentation will be made freely available on a Project Website and through the ResSpecies website, and the research will be disseminated through presentations at appropriate international conferences and publication in scientific journals of Genetics, Genomics and Bioinformatics research. The pedigree visualisation tool, coupled with the ResSpecies genetic inference algorithm, will be applicable to genetic studies of any species exhibiting diploid inheritance and implementation of a modular design will allow for the tool to be modified to operate with other genetic systems, increasing the potential user base to all genetic research communities. The successfully implemented pedigree visualisation model itself, designed as a reusable software module, should be useful in other contexts where display of pedigree information is required. For example this could be used for visualisation of extended human pedigrees or crop breeding pedigrees. The research involved in implementing a successful interactive pedigree visualisation will contribute to the wider field Information Visualisation (IV), outwith the immediate biological domain. It will be appropriate therefore to present these results to the IV research community through conference presentations and publication in IV journals.

Publications

10 25 50
 
Description An improved visualisation method for pedigrees, coupled with an interactive visualisation of inheritance errors leads to an improved efficiency in detecting and fixing the causes of the errors and hence cleaner data sets for analysis
Exploitation Route Cleaner data sets reduce statistical "noise" in genome-wide association and QTL studies thus improving the effectiveness of breeding schemes and biological investigations of genetically-controlled traits.
Sectors Agriculture, Food and Drink

URL http://www.viper-project.org.uk
 
Description Used in cleaning data sets for a number of QTL-based studies
First Year Of Impact 2011
Sector Agriculture, Food and Drink
 
Title Genotype Checker 
Description GenotypeChecker is a desktop tool for identifying likely data errors in pedigree/genotype datasets and to assist data cleansing. Datapoint errors in pedigree genotype datasets are difficult to identify and adversely affect downstream genetic analyses. Errors that are inconsistent with the rules of Mendelian inheritance typically invalidate linkage analysis algorithms, and cause such analyses to fail. Genotype errors may arise from a variety of systematic or sporadic errors in either the genotyping assay, or in recording the pedigree or genotype information. By applying an inheritance-checking algorithm for markers across the pedigree and visualising the inheritance data in an exploratory user interface, GenotypeChecker allows the sources of data inconsistency can be resolved. 
Type Of Technology Webtool/Application 
Year Produced 2009 
Impact Allowed us to clean data from complex pedigree experiments and thus permit the analysis of data that would otherwise have remained unusable. Provided the basis of the subsequent VIPER project which further developed the code and the ideas within it. 
URL http://bioinformatics.roslin.ed.ac.uk/genotypechecker/
 
Title VIPER 
Description VIPER combines an improved ResSpecies algorithm for genotype inheritance checking and inference with a novel space-efficient visualisation of pedigree structure in a desktop tool for exploring then cleaning data errors in pedigree/genotype datasets. Datapoint errors in pedigree genotype datasets are difficult to identify and adversely affect downstream genetic analyses. Errors that are inconsistent with the rules of Mendelian inheritance typically invalidate linkage analysis algorithms, and cause such analyses to fail. Genotype errors may arise from a variety of systematic or sporadic errors in either the genotyping assay, or in recording the pedigree or genotype information. By applying an inheritance-checking algorithm for markers across the pedigree and visualising the inheritance data in an exploratory user interface, VIPER allows the sources of data inconsistency can be resolved. VIPER displays the structure of the study population in a novel pedigree visualisation of generation sandwiches. Error rates reported by the inheritance algorithm are overlaid on the pedigree structure, allowing the inheritance pattern of reported errors to be explored, and the likely underlying bad datapoint resolved. 
Type Of Technology Webtool/Application 
Year Produced 2012 
Impact Permits more rapid and efficient cleaning and repair of genotype data from pedigree populations. Greatly speeds the analysis and improves the power of genetic linkage studies. 
URL http://bioinformatics.roslin.ed.ac.uk/viper/