ePhenotype - Visual Analytics for Integrated Large Scale Gene-Expression and Phenotype Data

Lead Research Organisation: University of Edinburgh
Department Name: MRC Human Genetics Unit

Abstract

Modern biomedical research generates large volumes of data much of which is image based. Biological insight is often achieved by visualising, analysing and cross-comparing data from multiple sources. Several of resources are capturing image-data on a genomic scale i.e. for every known gene. Examples are the Edinburgh Mouse Atlas project EMAP, which hosts over 30,000 gene-expression patterns, mapped onto standard models of mouse development. Another is the International Mouse Phenotyping Consortium, which is capturing images of mice with one gene "knocked-out" (that is inactive) to help understand the gene function. This massive international effort will deliver such data for every known gene and will detect regions of the developing embryo that are growing abnormally and show statistically significant departure from normal. Here we will develop novel tools to integrate, query and co-visualise these data to enable scientists to browse and analyse the image data online without needing to download vast quantities of data. These tools will deliver a capability of "visual analytics" allowing interactive exploration and hypothesis generation from the "big data" and enable data comparisons and analysis not otherwise possible.

This is now possible because of a number of key developments. First there are novel techniques for delivering image data and image-analysis via Internet (RESTful) services associated with each data resource and here we propose to extend these capabilities to include complex morphological data associated with deformation of biological tissues due to the knocked-out gene function. Secondly there are techniques for cross-mapping between resources that are each associated with an atlas. We will transform data from one resource atlas to the other and enable interoperable query, comparison and co-visualisation. Finally computer developments allow access to very large quantities of data very efficiently and make possible interactive image analysis even on multi-terabyte data volumes.

In this project we will develop novel tools to allow the visualisation and data-analysis of complex 3D data associated with mouse embryo development within an interactive environment. This will require novel development of tools for online visualisation based on the WebGL standard and using the de facto standard web-browser programming language Javascript. This visualisation includes complex deformation data expressed as vector displacements or ultimately as the strain tensor, which captures all aspects of the deformation transformation.

With these tools we are delivering new ways for scientists to browse and visualise the data and to interactively explore possible data associations and to refine and test hypotheses about gene interactions and their effect on embryo development. This will make the image data accessible in a way not previously possible and allow scientists to ask new questions that without these tools would be extremely difficult.

Technical Summary

We will develop an online interface to enable cross query and pattern matching between the EMAP embryo atlas and gene-expression database and the IMPC and DMDD knock-out mouse phenotype resources. The data will be spatially mapped by defining the complex transform between the two atlas frameworks and users will be able to use spatial patterns to query both resources. The results, especially the morphological variation associated with the phenotype screen, will be displayed in a web-browser based 3D visualisation tool that combines the capabilities of WebGL with the tiled views through very-large 3D image volumes provided by the Internet Image Protocol (IIP) extended by the Mouse Atlas Project to handle 3D data and arbitrary re-sectioning.

The visualisation will implement novel extensions on the Javascript WebGL library three.js in order to render effectively the heat-map type scalar data but alsovector and tensor data associated with the morphological mappings. In addition we will extend the IIP3D protocol and servers to allow image manipulation and measurement of properties. For this we will implement an "image calculator" to allow segmentation, domain binary arithmetic (union, intersect, difference) morphological operations, filtering and feature measurements such as volume, area, distance and any of the Region Connection Calculus (RCC) operations. The calculator will also allow spatial domains found for example as the intersection of two gene patterns and perhaps a phenotype heat-map to be used as a query on the databases. By these means the scientist can browse and interactively explore the data for gene and phenotype associations.

The output will be an open-source toolkit applicable to any 3D bioimaging application where the requirement is to visualise and explore very large image data-volumes. In addition we will deliver a functioning interoperable analysis tool for the large scale mouse embryo resources comprising the Mouse Atlas, IMPC and DMDD.

Planned Impact

Novel Toolkit for Large-Scale Image Visualisation and Image Analysis:

ePhenotype will deliver a package of tools that enable resources of large-scale images (10GB-1TB) accessible for query and browse via a standard web-browser. The package includes the server-side system, an extended standard protocol and RESTful API, a client side Javascript library to deliver WebGL-based rendering of complex data and an image analysis tool to provide interactive visual analytics. This will have a wide impact on many biomedical research, translational and educational areas.

Accessibility of Image Based Phenotype Data:

The primary image data from the IMPC embryo phenotype screen is made freely available and there is a need for an easy-to-use web interface to allow researchers to archive, find, and query across large volumes of 3D image data. Making it possible to access and query mutant embryo image data easily, and circumventing the need for data download, will have a major impact on familiarizing the research community with phenotype image data.

Who will benefit?

The biomedical research community will benefit from the ability to access large volumes of image data easily, without the need for data download, and with easy access to anatomical atlas resources to assist in the identification of phenotypically abnormal structures. In a wider context, the ePhenotype toolkit can additionally be used as a community resource for archiving annotations from secondary phenotype screens involving re-use of IMPC embryo phenotype data.

Data Integration:

ePhenotype provides the mechanism by which images archived in phenotype and gene expression databases can be integrated. This is of critical importance because images hold information that can be mined using automated image analysis methods. By integrating these different image-based resources, we enable the ability to perform a co-query across multiple database resources. This makes possible the ability to discover novel associations between gene expression and phenotype data.

What will be done to ensure they benefit?

We will raise awareness of the ePhenotype toolkit through Exhibitor Stand demonstrations at national and international meetings. The eMouseAtlas Project has an Exhibitor Stand presence at the SDB and BSDB annual meetings and these present an opportunity for one-on-one tutorials on how to use the ePhenotype toolkit to maximal effect. In addition, we will raise awareness among the IMPC community through Prof Baldock and Dr Armit's presence at the annual IMPC Meeting.
A publication describing the ePhenotype resource will be published in a high-profile journal and will be further publicised through postings on The Node.
The software tools will all be open source from the matech GutHub repository and presented at bioinformatics and bio-visualisation conferences and meetings as well as through publications.

Publications

10 25 50
 
Description This research has delivered two key outputs. The first is a series of software tools for visualising complex 3D data on a web-browser. This is based on two developments, use of the JavaScript library three.js to provide simple tools for surface-based (for anatomy) and cloud-based (for gene-expression and phenotype). These are freely available on the open-source repository GitHub. The second is the use of the Mouse Atlas tools and expertise to build a new and combined 3D atlas for the E14.5 and E15.5 mouse embryos. These embryos are now used within the IMPC phenotype programme and allows linkage and co-visualisation of phenotype data and gene-expression data. The new atlas models provide the spatial framework for further analysis and interoperability.
Exploitation Route Both parts of the outputs will find further usage. The software as a tool for web-developers and the atlas models as a core spatial reference frame for spatial data.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology

 
Description DMDD 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Deciphering Mechanisms of Developmental Disorders Two consortia, one funded by Wellcome Trust with supplement for equipment at the HGU the other still under review by MRC with funding for a post
Collaborator Contribution Large consortium to phenotype embryonic lethal knock-out strains of mice as part of the IMPC
Impact Funding acquired from Wellcome Trust Furher funding sought from MRC
Start Year 2012
 
Description DMDD 
Organisation Medical Research Council (MRC)
Department MRC National Institute for Medical Research (NIMR)
Country United Kingdom 
Sector Academic/University 
PI Contribution Deciphering Mechanisms of Developmental Disorders Two consortia, one funded by Wellcome Trust with supplement for equipment at the HGU the other still under review by MRC with funding for a post
Collaborator Contribution Large consortium to phenotype embryonic lethal knock-out strains of mice as part of the IMPC
Impact Funding acquired from Wellcome Trust Furher funding sought from MRC
Start Year 2012
 
Description DMDD 
Organisation Medical Research Council (MRC)
Department The Mary Lyon Centre
Country United Kingdom 
Sector Academic/University 
PI Contribution Deciphering Mechanisms of Developmental Disorders Two consortia, one funded by Wellcome Trust with supplement for equipment at the HGU the other still under review by MRC with funding for a post
Collaborator Contribution Large consortium to phenotype embryonic lethal knock-out strains of mice as part of the IMPC
Impact Funding acquired from Wellcome Trust Furher funding sought from MRC
Start Year 2012
 
Description DMDD 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Deciphering Mechanisms of Developmental Disorders Two consortia, one funded by Wellcome Trust with supplement for equipment at the HGU the other still under review by MRC with funding for a post
Collaborator Contribution Large consortium to phenotype embryonic lethal knock-out strains of mice as part of the IMPC
Impact Funding acquired from Wellcome Trust Furher funding sought from MRC
Start Year 2012
 
Description Deciphering Mechanisms of Developmental Disorders 
Organisation Francis Crick Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution We developed an interface to annotated histological section images in conjunction with phenotype identification.
Collaborator Contribution DMDD provided the image data and the scientific curation of the placenta data.
Impact Publications and the placenta histology web-site.
Start Year 2016
 
Description International Mouse Phenotype Consortium (IMPC) 
Organisation MRC Harwell
Country United Kingdom 
Sector Academic/University 
PI Contribution I am on the Scientific Advisory Panel for this project but also collaborate directly with the MRC Harwell group.
Collaborator Contribution MRC Harwell have provided access to specific atlas models in use within IMPC and examples data sets for developing the prototype ePhenotype interface.
Impact No outputs yet
Start Year 2016
 
Title Mouse Atlas GitHub repository 
Description All the woolz image-processing software, applications and the IIP3D 3D tile-image servers re now available from the ma-tech github repository. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The woolz technology is gradually being adopted to handel very-large 3D image voumes by a number of resources such as IMPC and DMDD. 
URL https://github.com/ma-tech
 
Description BiVi Annual Meeting TGAC 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Annual meeting of the BBSRC funded Biological Visualisation network held at The Genome Analysis Centre (TGAC). Gave invited talk.
Year(s) Of Engagement Activity 2015