Big-data analysis tools for bridging the gap between omics and earth system science

Lead Research Organisation: University of East Anglia
Department Name: Environmental Sciences

Abstract

The future of ocean science lies in the collection and analysis of big-data coming from new technologies such as gliders, satellites and high-throughput sequencing. This presents new challenges both in terms of the mathematical and computational techniques that we need to interrogate the huge amount of data being generated, as well as the models required to allow us to bridge the gap between disparate datasets. An important example of this challenge is the integration of data sets collected for marine microbes. Data sets range right through sequencing of metagenomes/transcriptomes of communities, measurements of cell sizes and distributions, to glider-based measurements of temperature, salinity, and nutrients in the environments which the microbes inhabit. The aim of this project is to develop new big-data analysis algorithms and tools to interrogate, merge and integrate these diverse datasets. To do this, we will employ and develop cutting-edge techniques in bioinformatics and data science. Our new tools will allow researchers to extract patterns hidden in their data, so as to answer important questions such as how in situ microbial diversity is related to environmental factors, and which genes and species to target in terms of their function (e.g. trace-gas production) and ecological role (e.g. invasive species).

In a current NEXUSS PhD project, we are adapting Oxford nanopore-sequencing technology to be used aboard ice-breakers. We have established a protocol for onboard and in situ sequencing of polar microbes and expect to have sequencing data from single isolates and mixed (mock) communities grown in controlled environmental conditions by autumn 2018. Annual Antartic expeditions are then planned using the new technology starting in early 2019 including MOSAiC, a year-round expedition onboard an icebreaker in 2020 which will collect data using various autonomous observing systems such as gliders. All of these activities will result in a huge array of interrelated sequencing and environmental data sets. However, the bioinformatics and data integration tools needed to analyse these data are lagging far behind both in terms of speed and ability to integrate the data. To address this challenge, we will first develop new bioinformatics tools and algorithms allowing us to fully harness the Nanopore technology, devising new algorithms for quickly assessing diversity and putative gene functions in sequencing data. Building on this, we will employ data science techniques such as data-warehousing and machine learning to design software allowing us and other researchers to merge the data so as to discover salient patterns in complex marine microbe datasets.

The NEXUSS CDT provides state-of-the-art, highly experiential training in the application and development of cutting-edge Smart and Autonomous Observing Systems for the environmental sciences, alongside comprehensive personal and professional development. There will be extensive opportunities for students to expand their multi-disciplinary outlook through interactions with a wide network of academic, research and industrial / government / policy partners. The student will be registered at The University of East Anglia. Specific training will include: data science, machine learning, software development, bioinformatics, programming, sequence analysis, polar microbes, and molecular biology.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
NE/R012156/1 01/10/2017 30/09/2022
2087766 Studentship NE/R012156/1 01/10/2018 31/08/2022 Anthony Duncan
NE/W503034/1 01/04/2021 31/03/2022
2087766 Studentship NE/W503034/1 01/10/2018 31/08/2022 Anthony Duncan