Big Data algorithmics for efficient search and analysis of large collections of genomes

Lead Research Organisation: Aberystwyth University
Department Name: Computer Science

Abstract

This project addresses novel approaches to the analysis of big data applied to genomic health challenges. Comparative studies of microbiome data yield advances in understanding of underlying causes and consequences of diseases (examples include Crohn's, Parkinson's and IBS). The computational resources required to analyse microbiomic data demands more efficient algorithms for search, matching and compression. The proposed research is focussed around advanced algorithmics and versatile data structures in stringology contributing to this need, including the Burrows-Wheeler Transform, suffix arrays and the Lyndon factorization. The aim of the project is to investigate innovative algorithmic approaches including various alphabet and string ordering methods, divide and conquer techniques, and bio-inspired genetic search operators for optimization. Artificial intelligence approaches including machine learning will be applied to search for and optimize results. Sequential and parallel implementations will be explored. Computational efficiency will be evaluated theoretically and experimentally using Supercomputing Wales HPC facilities and publicly available metagenome data sets. This will be a highly interdisciplinary project at the crossroads of computer science, biology, data science and informatics, thus affording transferable skills to related domains.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023992/1 01/04/2019 30/09/2027
2282975 Studentship EP/S023992/1 01/10/2019 30/09/2023 James Andrew Major