Using Big Data to Model the Maintenance of Health in the Human Oral Cavity

Lead Research Organisation: University of Glasgow
Department Name: College of Medical, Veterinary, Life Sci

Abstract

Studentship strategic priority area:Basic Bioscience Underpinning Heath
Keywords:Oral health, bioinformatics, biofilm

Abstract:
The Ramage Oral Biofilm Group have worked with GSK over the past 4 years through a BBSRC CASE studentship (Mr Christopher Delaney) to develop bioinformatic tools focussed on analysis platforms for microbiomes, transcriptomes and metabolomes, and developed appropriate pipelines for data integration. Through the explosion of microbiome studies, numerous well-described data sets have become publicly available. Recent single centre studies have shown the potential to predict different microbiomes associated with caries and periodontal disease, and within these identify differences between stable and dysbiotic populations (Zaura, et al., 2017). With this approach there lies an opportunity to use appropriately designed studies, where the biological microbiome data-sets can be centrally located, and bioinformatic tools used to integrate disease specific microbiomes. Indeed, this approach has recently been demonstrated in the context of gastrointestinal microbiomes (Duvallet, et al., 2017). Therefore, alongside the appropriate patient meta-data, the data-mining opportunities are boundless, and possibilities for creating a map of oral health and disease microbiomes is a realistic achievement. However, this relies on the capacity for data storage and processing, with future proofing to enable deposits of prospective data sets.

Study objectives: We aim to discern distinguishing features in the microbiome of patients with oral diseases across numerous patient cohorts. We will interrogate the currently published data to provide a more robust understanding of the microbiological changes in different oral health patient demographics. In brief, we propose the following steps.

1. Retrieve all the raw data from all previous oral microbiome studies and deposit them in a single location
2. Create a catalog of all of the samples from each of the studies (Containing, sequencing type, demultiplexed status)
3. Create meta-table of all cohort information across all samples.
4. We will build our pipeline around the published study above. In summary, after data collection, samples that had not been previously demultiplexed will be demultiplexed. Sequences will be filtered. All reads will be trimmed to the same length where possible, taking into considerations some of the studies will be older and using previous sequencing technologies (Roche). Operational taxanomic units (OTUs) will be identified by clustering which will be performed using USEARCH. These OTUs will be then be assigned taxonomy using the RDP classifier. Samples with a low abundance of reads and OTUS will be discarded from the dataset.
5. Once the data has all been processed to a similar standard they will be compiled and undergo statistical analysis.
6. We will group and classify samples according to similar criteria (Age, Smoking status, Disease status, Severity etc.) across the different studies. This will allow for us to compare the microbiome across a much larger cohort.
7. Multivariate analysis and microbiome community analysis will be performed in order to look for variables in the microbiome that are able to predict patient variables as mentioned above (caries, periodontal disease, etc).

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/V509541/1 01/01/2021 31/12/2024
2515685 Studentship BB/V509541/1 05/01/2021 04/01/2025 Mark Butcher