📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Vast-scale linear mixed modelling genetic discovery approaches for genome- and exome-wide association analyses to enable therapeutic target validation

Lead Research Organisation: University of Edinburgh
Department Name: MRC Human Genetics Unit

Abstract

Large-scale publicly available datasets, such as the UK Biobank (n=500,000 participants), which combine genome-wide genotyping and exome sequencing data with linkage to detailed phenotype measurement and electronic healthcare records have the opportunity to transform human genetic discovery analyses. Such datasets are transformative both in their scale and in the depth and diversity of quantitative and disease phenotypes available, and raised a strong interest both in the academia and the industry. In this regard, we have identified partners in Target Sciences (TSci) at GlaxoSmithKline (GSK), a leading team in the application of genetics in drug target discovery and validation. They have previously shown that drugs developed against targets with genetic support for the proposed disease are more likely to reach approval (PMID: 26121088), have used existing GWAS results to search for drug repurposing opportunities (PMID: 22491277) and to develop databases of gene-disease pairs to inform target discovery and validation decisions (PMID: 27899665, 28472345), and have used other biobank samples to influence selection of cardiovascular endpoints (PMID: 26791069) and search for drug repurposing opportunities (PMID: 27301456). GSK have previously performed large-scale targeted sequencing studies (PMID: 22604722) and recently funded exome sequencing of 50,000 participants in UK Biobank, with the aim of further supporting drug target discovery and validation. A major aim at GSK is to use UK Biobank data to conduct phenome-wide association studies (PheWAS), for variants known or predicted to affect gene function for drug targets of interest. The approach currently used is to test each single variant against thousands of disease traits, in the subset of unrelated individuals. However, this approach needs to be improved to distinguish between associations where the drug target variants are likely causal, from associations where the drug target variants are merely correlated (in linkage disequilibrium).

Testing all variants (potentially thousands) in order to fine map in the genomic context of each association of interest is inefficient. A preferable approach is to conduct PheWAS and fine mapping in genomic context, by querying a database of genome-wide association results for all diseases and phenotypes of interest. To maximize discovery power and fine mapping resolution, it is preferable to populate this database with results calculated using in the largest possible sample size. However, an almost inevitable consequence of increasing sample sizes from human populations, is that a larger fraction of participants are related to other participants in the sample. Traditional approaches, such as removing one participant from each related pair, may lead to the removal of a significant proportion of participants from the analysis with consequent loss of statistical power. An alternative approach is using mixed linear model approaches to correct for population structure. However, these approaches require the development of new software tools to deal with large sample sizes, variants and numbers of phenotypes. However, GSK TSci scientists lack the technical expertise required to implement efficient mixed model association testing at the scale required, so this joint project is aimed to collaborate with them to develop the required methods to populate the database. Our work has the opportunity to be impactful on drug discovery and development.

Technical Summary

To address the objectives of the fellowship, we will further develop DISSECT (PMID: 26657010). This is a software tool developed within the group, which was designed to overcome the compute and memory limitations of single compute nodes by taking advantage of the aggregate power of the thousands of processor cores and large distributed memory available on supercomputers or large compute clusters. For this purpose, DISSECT distributes the available data over the multiple nodes. At any given time, each node has access to only a small portion of the data on which it performs local computations. When the algorithm requires access to blocks of data currently held on other nodes, the nodes communicate to coordinate data redistribution. This approach provides access to much larger computational resources for a single analysis (i.e. increases the scalability) than standard tools that can only use the resources of a single compute node for each analysis, even when running on similar computer clusters environments. In addition, using as a basis our current development, we will further develop, evaluate and implement previous approaches (PMID: 25642633, 21465547) that propose to perform approximations to reduce the computational cost of fitting these models on large datasets, and find a balance between speed, accuracy, and computation requirements. The proposed analyses will be run on Tier-1 and Tier-2 High Performance Computing Centres such as ARCHER (https://www.archer.ac.uk) and CIRRUS (http://www.cirrus.ac.uk).
 
Description Analysis of non-a dditive genetic effects affecti ng complex traits in large datasets
Amount £30,000 (GBP)
Funding ID IS3-R86 
Organisation University of Edinburgh 
Sector Academic/University
Country United Kingdom
Start 03/2019 
End 04/2020
 
Description GOLEM: High Performance Computing platform for a paradigm shift in genetic analysis
Amount £94,329 (GBP)
Funding ID MRC/CIC8/76 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 03/2021 
End 02/2022
 
Description Golem: A disruptive platform to access and interactively analyse genetic data
Amount £288,694 (GBP)
Funding ID 10001080 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 08/2021 
End 09/2022
 
Description What genomic analyses and iTunes have in common?
Amount £54,022 (GBP)
Funding ID 29-34 / 520268126 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 07/2020 
End 03/2021
 
Title Genetic analyses on demand 
Description We developed a computational system which will largely improve the capacity of performing common genetic analysis in large datasets (GWAS, GxE, GxG, etc). Its key strengths are around the capacity to compute orders of magnitudes faster using large datasets, privacy and UX/UI: · The back-end enables users to perform the analysis in seconds: currently it can run ~80,000-160,000 GWAS per day on data-sets w/ >500k individuals and >10M genetic variants on a very reduced set of servers. · The analyst does not need to have direct access to the data, so the owner may keep it safe, if there are any restrictions in place. · A web tool enabling researchers without programming skills to rapidly and efficiently explore and prepare the data, combined with a front-end that allows the researcher to explore interactively the results and integrate them with information from different public databases. The system will largely optimize the use of researchers time, and cost of performing analyses, by enabling them to explore the data and perform analyses interactively in real-time. It would be equivalent to querying a database of pre-computed results. However, because analyses are performed on demand, the approach allows researchers to modify, and adapt the models efficiently and re-run the analysis interactively. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? No  
Impact The tools is not public, yet. We expect to release it this year. 
 
Title A comprehensive catalogue of regulatory variants in the cattle transcriptome 
Description Understanding functional consequences of genetic variants on the transcriptome of livestock is essential for interpreting the molecular mechanisms underlying traits of economic value, and for improving the rate of genetic gain through artificial selection. Here, we build a cattle Genotype-Tissue Expression atlas (cGTEx) for the research community based on 11,642 RNA-seq publicly available datasets (by July, 2019), representing over 100 tissues/cell types among over 40 breeds. We describe the landscape of transcriptome across tissues and report thousands of cis- and trans- genetic variants (QTLs) associated with gene expression and alternative splicing for 24 major tissues in cattle. Additionally, we detect 496 gene-tissue pairs significantly associated with 43 economically important traits in cattle via a large transcriptome-wide association study (TWAS). All the genome annotation files are based on ARS-UCD1.2 (Ensembl 96 version). The cGTEx Portal allows researchers to query gene expression, alternative splicing and QTLs across tissues in an easy and uniform way, which can serve as a primary source of reference for cattle genomics, cattle breeding, adaptive evolution, comparative genomics, and veterinary medicine. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact It is accepted in Nature Genetics (not published, yet) and lead to an international collaboration to create a much more comprehensive database (farmGTEx) that aims to combine data from different species. 
URL https://cgtex.roslin.ed.ac.uk/
 
Title Additional file 1 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 1: Table S1. Summary of RNA-seq samples in humans and cattle. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Comparative_transcriptome_...
 
Title Additional file 1 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 1: Table S1. Summary of RNA-seq samples in humans and cattle. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Comparative_transcriptome_...
 
Title Additional file 10 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 10: Table S9. Partitioning heritability with expression-conserved and divergent genes in milk production traits using GREML-LDMS. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_10_of_Comparative_transcriptome...
 
Title Additional file 10 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 10: Table S9. Partitioning heritability with expression-conserved and divergent genes in milk production traits using GREML-LDMS. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_10_of_Comparative_transcriptome...
 
Title Additional file 11 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 11: Table S10. Summary of novel variants detected by PolyFun + SuSiE in human height. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_11_of_Comparative_transcriptome...
 
Title Additional file 11 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 11: Table S10. Summary of novel variants detected by PolyFun + SuSiE in human height. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_11_of_Comparative_transcriptome...
 
Title Additional file 3 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 3: Table S2. Significantly enriched Gene Ontology terms for three groups of tissue-specific genes. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Comparative_transcriptome_...
 
Title Additional file 3 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 3: Table S2. Significantly enriched Gene Ontology terms for three groups of tissue-specific genes. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Comparative_transcriptome_...
 
Title Additional file 4 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 4: Table S3. Significantly enriched Gene Ontology terms for up-regulated genes in cattle and humans. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Comparative_transcriptome_...
 
Title Additional file 4 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 4: Table S3. Significantly enriched Gene Ontology terms for up-regulated genes in cattle and humans. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Comparative_transcriptome_...
 
Title Additional file 5 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 5: Table S4. Significantly enriched Gene Ontology terms for genes with more conserved expression between human and cattle than between human and mouse. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_5_of_Comparative_transcriptome_...
 
Title Additional file 5 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 5: Table S4. Significantly enriched Gene Ontology terms for genes with more conserved expression between human and cattle than between human and mouse. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_5_of_Comparative_transcriptome_...
 
Title Additional file 6 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 6: Table S5. Significantly enriched Gene Ontology terms for genes with variable and consistent expression across tissues in humans and cattle. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_6_of_Comparative_transcriptome_...
 
Title Additional file 6 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 6: Table S5. Significantly enriched Gene Ontology terms for genes with variable and consistent expression across tissues in humans and cattle. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_6_of_Comparative_transcriptome_...
 
Title Additional file 7 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 7: Table S6. Summary of 46 GWAS in humans. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_7_of_Comparative_transcriptome_...
 
Title Additional file 7 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 7: Table S6. Summary of 46 GWAS in humans. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_7_of_Comparative_transcriptome_...
 
Title Additional file 8 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 8: Table S7. Summary of LDSC results of base model (without partitioning heritability) for 46 human complex traits. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_8_of_Comparative_transcriptome_...
 
Title Additional file 8 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 8: Table S7. Summary of LDSC results of base model (without partitioning heritability) for 46 human complex traits. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_8_of_Comparative_transcriptome_...
 
Title Additional file 9 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 9: Table S8. Heritability enrichment analysis of expression-conserved and divergent genes in human complex traits using LDSC. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_9_of_Comparative_transcriptome_...
 
Title Additional file 9 of Comparative transcriptome in large-scale human and cattle populations 
Description Additional file 9: Table S8. Heritability enrichment analysis of expression-conserved and divergent genes in human complex traits using LDSC. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_9_of_Comparative_transcriptome_...
 
Title Comprehensive analyses of 723 transcriptomes enhance genetic and biological interpretations for complex traits in cattle 
Description We here uniformly analyzed 723 (156 newly generated and 567 existing) RNA-seq datasets to build a gene atlas in cattle, which included 91 tissues and cell types from 447 individuals. We summarized the sample information, their NCBI accession numbers, and expression (FPKM) of 24,616 Ensembl genes (based on UMD3.1) here. Through integrative analyses of this gene atlas with large-scale genome-wide association studies, we detected relevant tissues/cell types and candidate genes for 45 economically important traits in cattle (under review in Genome Research). This cattle gene atlas will serve as a primary source for biological interpretation and functional validation of GWAS findings, studies of adaptive evolution and population genetics, as well as genomic improvement in cattle. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This created the basis of a collaboration that lead to the creation of a bigger dataset (cGTEx) and ultimately the farmGTEx international collaboration. 
URL http://cattlegeneatlas.roslin.ed.ac.uk/
 
Title Gene ATLAS GWAS database 
Description Database containing genome-wide association analysis results for 778 human traits and ~30 million genetic variants. In the analysis we used ~450,000 individuals from UK Biobank. We also developed a web tool to browse this database (see also "Software & Technical Products" section). 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact The website were the database is public received more than 140,000 visits from researchers around the world since created. We also published an article in a high impact journal (https://www.nature.com/articles/s41588-018-0248-z). 
URL http://geneatlas.roslin.ed.ac.uk/
 
Title PigGTEx_v0 - Significant molQTL 
Description Summary statistics of significant molQTL from the pilot phase of PigGTEx (http://piggtex.farmgtex.org). 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://www.scidb.cn/en/detail?dataSetId=8c6e14efcc8d4e37b5e5294c86439367
 
Title cGTEx_dataset:A multi-tissue atlas of regulatory variants in cattle 
Description The files are raw data of the cGTEX dataset used in the publication https://doi.org/10.1038/s41588-022-01153-5. For details, please read the Methods section. 1. cGTEx_meta_data_8646sample.xlsx Metadata consists of sample names with their sample accession, including information such as data size, cleaned reads, mapping rate, and age. The data is extracted from SRA (https://www.ncbi.nlm.nih.gov/sra/) and BIGD (https://bigd.big.ac.cn/bioproject/) ( samples starting with CRS) 2. cGTEx_count_8646sample_27607gene.txt.gz Data consist of raw RNA-seq read count of 27607 genes (column names as Ensembl gene id )of 8646 samples (as row names) 3. cGTEx_TPM_8646sample_27607gene.txt.gz Data consist of TPM values of 27607 genes (column names as Ensembl gene id) in samples (8646 samples as row names) 4. cGTEx_imputed_vcf.tar.gz Imputed genotypes (SNP) of 7297 RNA-seq samples in 29 autosomes. 5. cGTEx_exon_junction_8646sample.tar.gz Exon junction files of 8646 files Note: Small discrepancies in some sample names or the absence of headers in some data sets compared to https://cgtex.roslin.ed.ac.uk/ are sorted out in this upload. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://zenodo.org/record/7560234
 
Description Vast-scale linear mixed modelling genetic discovery approaches for genetic by environment association analyses 
Organisation GlaxoSmithKline (GSK)
Country Global 
Sector Private 
PI Contribution Analyzing large datasets, such as those of the size of UK Biobank, is computationally expensive. The challenge is bigger when thousands of phenotypes have to be analyzed. Although different software solutions are arising, they are in general limited on the types of models they can fit. To address this problem, we are expanding our tool (http://www.dissect.ed.ac.uk/) to test genetic by environment interactions on thousands of phenotypes in datasets of the size of UK Biobank.
Collaborator Contribution UK Biobank provided access to thousands of measurements on very large numbers of individuals. However, several of those require expertise in a particular field to properly prepare the data, or combine different data fields to generate or curate a new one. GSK has the expertise and resources to do this.
Impact Work in progress. There is not any output from this collaboration, yet.
Start Year 2018
 
Title Interactive analysis of large datasets whilst keeping the data protected. 
Description Human genomic data is doubling in size every seven months and will soon exceed other Big Data generators such as astronomy, YouTube and Twitter. Extracting value from this data is a key step in areas such as drug targeting and personalized medicine. According to Global Market Insights, the digital genome market is projected to hit $50.4 billion by 2025, and is key for two UK Grand Challenges: AI & data and ageing society. Accordingly, the UK is positioning as a big player through strategic investments to create world leading resources such as UK Biobank and Genomics England. Reaching the full potential of this substantial investment relies on developing associated industries around it to unlock value from the data. However, several barriers still exist: a) Legal, political, or economic restrictions hamper access to multi-institutional and multi-national fragmented data. b) Preparing and analysing the data may require days or even weeks of a highly skilled individual's work performing repetitive low-value tasks. c) Analysing genetic data requires multidisciplinary skills. d) Requirement of large computational resources. Not all organizations perceive these problems the same way. Whilst public organisations and small biotechs struggle to find the adequate skills and allocate the required computational resources, this does not seem to be a major concern for big pharma companies. On the other hand, difficulties accessing multi-organization scattered data affects all organizations. Other problems in the field including scalability, evidence, equity, democratization, information, health, and carbon footprint. Several companies have been created to address these problems, some of whom we have met (LifeBit and DNA Nexus). We do not believe their solutions satisfactorily address the field's major challenges we have identified. In particular, the analyst still needs to "see" the sensitive data through their platforms and also requires high computational costs and time requirements. The development proposes to overcome these challenges through: 1) An extremely efficient computation engine we developed. Using inexpensive hardware, it reduces large dataset analysis times, from days to seconds. 2) An easy-to-use web system that enables the engine to be interactively queried without requiring direct data access - even by the person analysing it. These technologies together can be disruptive to how data is currently accessed and analysed. Our solution can move the field from the current situation, where an analyst struggles to reach data and then spends weeks in an iterative cycle of data preparation and analysis, to a situation where multi-organization fragmented data is accessed easily, and queried interactively and on-demand. 
IP Reference  
Protection Trade Mark
Year Protection Granted
Licensed Yes
Impact It is used in an early stage spin-out
 
Title Real time diagnosis of rare diseases 
Description One in two-hundred babies will be born with a developmental disorder. The success of human genetics means there is an ever-increasing number of human diseases for which genome-wide sequencing of DNA (DNA-GWS) can provide confident diagnosis. A unifying and robust diagnosis is key to improving prognosis and identifying effective therapeutic strategies. This is particularly critical in rapid DNA-GWS in acutely unwell infants, one of the most rapidly growing sectors of diagnostic genomics. As sequencing technologies improve, comprehensive analysis of sequence data has become the bottleneck. Currently, analysis of different classes of genomic variants is done serially using distinct and computationally intensive quality control measures (QC) prior to diagnostic analysis to determine the clinical (or research) usefulness of each test. The scale of DNA-GWS data means many hours of computational time with review by specially trained scientists are required for each step. Our disruptive innovation answers an important unmet need: to provide flexible and robust methods to identify all major classes of disease associated genomic pathologies in real-time. Our analytical protocols can QC and optimise the data on a per-individual, per-family or per-cohort basis as required, enabling novel analytical approaches to be developed, tested and implemented. Our current prototype, comprising the computation engine and rudimentary user interface, is tested on parent-child trio DNA-GWS data where it demonstrates variant tests at a rate of 119 million variants/second, allowing filter manipulations sufficient to compare 100 parameters on 50 million variants within 0.42 seconds in consumer grade hardware. 
IP Reference  
Protection Trade Mark
Year Protection Granted
Licensed No
Impact Still under development.
 
Title Expansion of DISSECT, a tool to use High Performance Distributed Computing environments to perform analysis on large datasets. 
Description DISSECT is designed to distribute computationally expensive analysis between large numbers of computing nodes connected through a network. This allows very large scalability by enabling the use of thousands of processors and large amounts of memory to perform a single analysis on very large datasets. The tool is not designed to just perform a particular analysis. It is easy to expand to add new analysis on this distributed computing schema. During this period, we extended the capabilities of DISSECT by adding the possibility of performing genome-wide association studies on large numbers of phenotypes in one analysis. We are also extending it to add more complex tests such as genetic by environment, or genetic by genetic interactions. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Since we started developing DISSECT, we used it to produce several works published in different high impact journals. In this last year, the expansion of this tool allowed us to pre-compute genome wide association studies for 788 human traits using ~450,000 related and unrelated UK Biobank individuals. The results of these analysis, allowed us to develop and publish the Gene ATLAS database and web tool. This has been published in a high impact journal (https://www.nature.com/articles/s41588-018-0248-z) and the web received more than 140,000 visits from researchers around the world, since the web was made public. 
URL http://www.dissect.ed.ac.uk/
 
Title Gene ATLAS 
Description Webtool that enables to explore the Gene ATLAS database (see also "Research Databases & Models") results obtained from pre-computed genome-wide association studies on 778 human traits analyzed using ~450,000 related and unrelated individuals from UK Biobank. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact The tool produced a publication in a high impact journal (https://www.nature.com/articles/s41588-018-0248-z), and accumulates more than 140,000 visits from researchers all around the world since published. 
URL http://geneatlas.roslin.ed.ac.uk/
 
Company Name Omecu 
Description Omecu develops a cloud-based platform for the analysis of large-scale genetic and epidemiologic datasets, with the aim of democratising genome data. 
Year Established 2021 
Impact It has been created recently.
Website http://omecu.com
 
Description Talk to medicine students about enterpreneurship 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact The Edinburgh Innovation Ambassador for the College of Medicine at the University of Edinburgh invited me to give a presentation to students about my experiences of spinning out from the university. I made a presentation to them in the Student Enterprise Hub.
Year(s) Of Engagement Activity 2022
URL https://events.irm.ed.ac.uk/Events/Event/7015J000000HU5Y