Development of GPGPU tools for modelling complex phenotypes

Lead Research Organisation: University of Edinburgh

Department Name: The Roslin Institute

Abstract

We are investigating how genes make some people or animals more susceptible to certain diseases (e.g. cancer) or better at production traits (e.g. milk yield) than others. In the long term this research could be used to predict what diseases individuals and animals are prone to and what age they are likely to develop them. With this information better drugs and preventative treatments could be developed. This will also help to improve food production and safety for an increasing human population. To investigate this, we take samples from a large number of diseased or healthy people or animals. The genomes of these two groups are then studied and particular parts of the genome (called genes) pinpointed as contributing to the differences between the groups. Doing those comparisons requires complex mathematical and statistical models.
We have developed statistical methods that are able to model the traits of animals and people as a function of their genetic make-up and aid us in identifying what genes are contributing to the differences between groups. However, these methods require a large number of calculations that take a long time to complete when using standard computer processors (or clusters of them). This research proposal will develop software tools that speed-up this calculations substantially and hence will help us achieve our scientific aims more quickly. The software tools will run on Graphics Processing Units (GPUs), which are the fast computing processors used in graphic cards and that allow people to play fast and fun computer games. We will use the same programming 'tricks' and technology used by the computing game industry to understand how genes work, and how they interact with each other to make people or animals more or less prone to disease.

Technical Summary

In the last years, genome-wide association studies (GWAS) have allowed an unprecedented exploration for genetic variants contributing to complex traits. GWAS have genotyped thousands of human and animal genomes with very dense single nucleotide polymorphism arrays and correlated genetic variation with phenotypic variation. Despite the arguable success of GWAS for most complex traits, in reality most of the standing genetic variation remains unidentified. Although the 'missing heritability' problem is currently obvious in human studies, it is very likely that the same problem will arise in wild, farm and companion animals as data becomes available.
One strategy to identify the 'missing heritability' is to fit non-additive genetic models. Fitting these models is computationally intense and we lack fast tools to perform global and unbiased searches of the genome in a reasonable amount of time. We will exploit the power of Graphics Processing Units (GPUs) to address one of the most important unanswered questions in complex traits' genetics: where is the missing genetic variation hidden?.
We have developed an analytical approach to identify quantitative trait and disease susceptibility loci, i.e. to capture genetic variation at functional genomic regions. Our approach estimates genomic relationships among individuals at particular position of the genome from the observed genotypes and fits the individuals' additive genetic value at that position as a random effect in a mixed-linear model framework. However, current tools are slow and this makes global epistatic searches and obtaining empirical significance thresholds impossible using our analytical approach. We estimate that the proposed project will deliver a five-fold increase in performance over our current CPU software implementation.

Planned Impact

Impact on the academic community
The proposed research will benefit complex traits' geneticists working in model organisms, humans and livestock, wild and companion animals. It will aid them to identify the genes and loci that code for and control complex traits and diseases. This in turn will help to understand how genes interact with each other and with the environment.
Identifying the genes that contribute to particular traits (e.g. diseases) makes feasible the study of the molecular mechanisms that lead to them. Molecular biologists will be primary beneficiaries of the successful application of our tools to complex traits in humans and animals.
Impact on the industry
Our research will help the breeding industry to maintain a competitive advantage through improved breeding schemes. Identifying the loci contributing to production traits will help to build better prediction models and hence achieve higher genetic gains. It will also help to maintain sustainable food (protein) production and reduce the environmental burden of the livestock industry.
Our tools will allow the discovery of genes associated with disease onset and progression. Mechanistic insights generated by the discovery of those genes will help the pharmaceutical industry to inform the selection of candidate chemical compounds thereby increasing the success rate of potential useful compounds and speeding-up drug discovery and development.

Impact on human and animal health
Predicting phenotypes is important in human disease: better prediction models will lead to better screening strategies, allocation of resources and intervention strategies, hence informing public health policy.
Our methods will help to understand the genetic architecture of complex diseases in livestock and companion animals, this will help to develop better screening programmes, improve public health policies and facilitate the development of better therapeutics.

Impact on users
The impact on users will be tremendous; our GPU code is likely to be a hundred times faster that available software. This means that global searches for epistasis would be feasible and that empirical significance thresholds could be obtained. Both of these analyses are currently not feasible.

Timescale
Uptake of the software is likely to be quick because the results from genome-wide association studies have, to a degree, not fulfilled their original expectations and there is a need to try new approaches to identify the missing genetic variation.

Funded Value:

£108,731

Funded Period:

Jul 12 - Aug 13

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/K000195/1

Principal Investigator:

Albert Tenesa

Research Subject:

Genetics & development (42%)

Info. & commun. Technol. (14%)

Omic sciences & technologies (28%)

Tools, technologies & methods (14%)

Research Topic:

Gene action & regulation (42%)

Genomics (28%)

High Performance Computing (14%)

Parallel Computing (14%)

Organisations

People	ORCID iD
Albert Tenesa (Principal Investigator)
Alan Gray (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Cebamanos L (2014) Regional heritability advanced complex trait analysis for GPU and traditional parallel architectures. in Bioinformatics (Oxford, England)

Howrigan DP (2016) Genome-wide autozygosity is associated with lower general cognitive ability. in Molecular psychiatry

Huffman JE (2015) Modulation of genetic associations with serum urate levels by body-mass-index in humans. in PloS one

Rowe SJ (2013) Complex variation in measures of general intelligence and cognitive change. in PloS one

Saura M (2015) Evaluation of the linkage-disequilibrium method for the estimation of effective population size when generations overlap: an empirical case. in BMC genomics

Tenesa A (2013) The heritability of human disease: estimation, uses and abuses. in Nature reviews. Genetics

Key Findings
Impact Summary
Collaboration
Software and Technical Products
Spin Outs


Description	We developed computer software to perform complex statistical analyses of genomic data. The software is very fast because uses computer CPUs (used in standard computers) and GPUs (usually used for gaming). This has allowed us to make better use of the genomic data available and help us identify parts of the genome that are important in traits such as height, colorectal cancer or milk yield.
Exploitation Route	Researchers have used this software to study the genetics underlying hip dysplasia in dogs, non-pathological cognitive decline in humans, and in theoretical simulations.
Sectors	Agriculture Food and Drink Healthcare
URL	http://www.roslin.ed.ac.uk/albert-tenesa/software/


Description	The development of the software lead to training in high performance computing, and statistics. Training in these two difficult to find skills is the main impact of this grant. The person trained moved to New Zealand for a senior industry-research post.
First Year Of Impact	2013
Sector	Agriculture, Food and Drink
Impact Types	Economic


Description	UK Biobank Research Analysis Platform
Organisation	UK Biobank
Country	United Kingdom
Sector	Charity/Non Profit
PI Contribution	We were invited by Mark Effingham (Depute CEO of UK Biobank) to be one of the avant-garde teams to access the UK Biobank research analysis platform to adapt and deploy some of the tools we have developed for the analysis of genomic data.
Collaborator Contribution	We are working with UK Biobank and DNAnexus to set up the compute configuration to allow fast genome-wide association studies with array genotypes, imputed genotyped, whole exome and whole genome data.
Impact	No outputs yet.
Start Year	2020


Title	REACTA
Description	The software performs mixed linear models using genomic information.
Type Of Technology	Software
Year Produced	2013
Open Source License?	Yes
Impact	The software has been used widely within the University of Edinburgh, and our algorithms were incorporated into the original GCTA software that we started from. The GCTA is widely used since it was the original.
URL	http://www.roslin.ed.ac.uk/albert-tenesa/software/


Company Name	Omecu
Description	Omecu develops a cloud-based platform for the analysis of large-scale genetic and epidemiologic datasets, with the aim of democratising genome data.
Year Established	2021
Impact	Received support from the Wellcome iTPA programme, participated in the SETSquared ICURe programme, and received Medical Research Council grants. They also received funding from the University's Data-Driven Entrepreneurship Seed Fund and Fast Track Mentor initiatives, supported by the Scottish Funding Council.
Website	http://omecu.com