TGAC ISPG - Developing Strategies for Big Data Bioinformatics

Lead Research Organisation: Earlham Institute
Department Name: UNLISTED


This ISP proposal aims to develop bioinformatics tools to handle and process big genomics datasets. DNA sequencing has become an enabling technology at the heart of a revolution in life sciences. The objectives in this proposal focused on providing in the next 5 years the tools required to realise the value of big data bioinformatics. The mission of TGAC is to establish a centre of scientific excellence specialising in genomics technology, high throughput data generation and bioinformatics to work along side the UK scientific community around the BBSRC strategic research priorities for 2010-2015: food security, bioenergy and biology underpinning health. The implementation of this ISP will set up the foundations to fully develop the incipient bioinformatics groups at TGAC into a World-class research centre. The objectives for this proposal are organised in three themes: the processing of raw data into genomic information, the translation of this information into biological knowledge, and the development of an infrastructure to sustain these activities. The demands in terms of volume and data types are varied and require different skills sets and emphasis. The first objectives focus on the processing of large datasets with two concrete examples of application in genomics that are relevant to the BBSRC research community. The development of efficient assembly algorithms for large eukaryote genomes from short reads is still a topic of intensive research. In fact the product obtained from the high-throughout sequencing technologies although much cheaper is of lower quality than the assemblies generated by the previous technologies. This is a challenge that will particularly affect the development of genomics research for plant with complex and large genomes characterised by other features such as polyploidy and nested repeat structures that complicate the assembly process.