Homomorphic Encryption of Genotypes and Phenotypes for Quantitative Genetics

Lead Research Organisation: University College London
Department Name: UCL Genetics Institute

Abstract

In order to identify genes that are associated with important traits like disease in humans, or improved yield in crops, it is necessary to analyse very large samples of individuals. Often this involves sharing genetic and other data collected in different studies, and there are risks to individuals' genetic privacy if these data are shared as plaintext. Homomorphic encryption refers to a type of data encryption that obscures the original plaintext data by replacing it with a ciphertext which nonetheless contains sufficient structure that it is still possible to perform the same data analyses as with the plaintext, thereby increasing the power to make discoveries whilst maintaining genetic privacy.

We have previously developed a method for homomorphic encryption of genotype and phenotype data, based on random high-dimensional rotations of data. In this proposal we will develop our method into a practical tool that can be used by geneticists and other scientists. This will involve writing a software implementation that can operate on very large datasets, and working closely with stakeholders to ensure the code is as useful as possible.

Technical Summary

Quantitative genetic analysis - such calculating heritability, testing genetic association, using mixed linear models to control for unequal relatedness between individuals - is a cornerstone of several important areas of genetics, including human complex disease mapping, and animal and crop improvement. To make progress it is often necessary to share data between studies, but privacy concerns sometimes prevent or delay data sharing. We previoiusly developed a method based on the use of random orthogonal matrix keys to encrypt genotype and phenotype plaintext into cyphertext that closely resembles samples from Gaussian deviates. Orthogonal transformation leaves unchanged keys parts of the quantitative genetic machinery, including the likelihood, parameters, heritability and the effects of a mixed model transformation. However, they scramble the identities of individuals by replacing individual genotypes with random linear superpositions.

We propose to develop the use of random orthogonal matrix keys, into a fully-fledged methodology and software package that can be used routinely by genetics researchers to share and analyse genetic data. We will also extend the methodology to other datatypes such as transcriptomic data, provided the analysis fits with a mixed model framework with Normal errors. We will aim to identify and correct any weaknesses that might permit decryption, and to work with potential users of the system in both human, plant and animal genetics, to propagate its use and thereby accelerate the sharing of genetic data, and of the the use of the FAIR (Findable, Accessible, Interopeerable and Repoducible) principles.

Publications

10 25 50
 
Description Collaboration to test homomorphic encryption for animal breeding 
Organisation Iowa State University
Country United States 
Sector Academic/University 
PI Contribution We are collaborating to test the encryption methods we are developing can be used for animal breeding, using commercial pig data set as a test case.
Collaborator Contribution A seed grant to AG2P was applied for which was successful. This paid for a posdoc at UCDAVIS to evaluate the methodology in the context of Bayesian QTL mapping, which was successful. A paper is in preparation. A grant was then submitted to USDA to continue the work (decision expected mid 2023)
Impact None yet
Start Year 2022
 
Description Collaboration with Gene Network 
Organisation University of Tennessee
Country United States 
Sector Academic/University 
PI Contribution We have a collaboration with researchers at the University of Tennessee Health Sciences Center, USA to implement the HEGP genotype privacy methodology in their GeneNetwork system
Collaborator Contribution Our partners will use the HEGP system for encrypting genotypes to enhance the genetic privacy of the Gene Network and database system
Impact None to date
Start Year 2023
 
Description Sharing encrypted genotypes and phenotypes to accelerate crop improvement 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Richard Mott presented the methodology underlying the encryption method in an online talk at #UKPlantSciPresents on 28/09/2021 in order to interest plant breeders in the method. The recording has been posted on youtube.
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=2QlVvOlEQFU