Interactive Visualization of Temporal Multi-Omic Data

Lead Research Organisation: Newcastle University
Department Name: Sch of Computing

Abstract

The rapid development of Next Generation Sequencing (NGS) technologies has dramatically increased the ability to sequence genes and genomes over the last decade, leading to the potential of more enriched, deeper understanding of microbial communities within a range of bioscience areas. Today, the most difficult and rate limiting step is not the generation of 'omics data, but the interpretation of it, and the investigation and analysis of these large and diverse data has become a major bottleneck. Furthermore, the ability to link relationships and patterns across multiple 'omics datasets (such as metagenomics and metatranscriptomics) can support a deeper understanding of biological functions. Analysis methods that support gaining of such insight can greatly facilitate decision making as well as support generation of novel hypotheses.

This project will develop novel methods for interactive visualization and explorative analysis of temporal 'omics data of multiple types, working with scientists in the School of Natural and Environmental Sciences that research the emergent technology of plant growth promoting Bacteria (PGPB) as biofertilisers, biopesticides or biostimulants. The data to be used in the project will have been generated through analysis of crops and substrates under controlled conditions. This will include metagenomics and metatranscriptomics data from soil samples, sampled at multiple time points to generate data that can be used to monitor temporal dynamics of rhizosphere microbial communities.

Such data pose multiple challenges from data analysis perspective. Firstly, analysis across two types of 'omics data needs to be carried out to support understanding of the biological system present. These data are high dimensional, and the need to analyse across datasets adds to the challenge while also opening potential for gaining deeper insights into the biological system. Secondly, the data is temporal and insight into time based patterns will be key to identifying biomarkers for presence and effect of PGPBs. While visualization and exploration of multivariate temporal data has been a major focus in the data visualization community, no satisfactory solution has yet been developed that support temporal analysis across multiple datasets with such high dimensionality as 'omics data. Thirdly, sampled 'omics data is commonly associated with meta data of importance, and statistical analysis is normally required to confirm the reliability of findings and data patterns. These needs to be incorporated into the analysis to make full use of the potential of the data and support reliable analysis.

Visual representations to represent overall patterns in the high dimensional datasets will be designed, as well as interactive methods for examining finer details and data subset selection methods. Visualization methods will be integrated with semi-automated data mining and machine learning approaches to provide interactive analysis and pattern identification, aiding quick identification of patterns of potential interest. Methods and algorithms for identification of patterns across multiple datasets will be developed, and may for instance include correlation-, cluster- and sub-cluster analysis across the datasets.

The project will focus on visualization methods to highlight and link identified relationships across multiple datasets, as well as overview representation of the main patterns and relationships across multiple datasets. These will be extended to temporal data, producing representations of changes over time, supporting exploration to track plant environment interactions and aiding the generation of novel hypotheses around PGPBs and plant growth.

Developed tools will be generalised and made them openly available to microbiologists and bioinformaticians, to increase the impact of the work within areas such as bioscience and health informatics.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509528/1 01/10/2016 31/03/2022
2202945 Studentship EP/N509528/1 29/04/2019 17/08/2025 Hugh Garner
EP/R51309X/1 01/10/2018 30/09/2023
2202945 Studentship EP/R51309X/1 29/04/2019 17/08/2025 Hugh Garner
EP/T517914/1 01/10/2020 30/09/2025
2202945 Studentship EP/T517914/1 29/04/2019 17/08/2025 Hugh Garner