Biology from bioinformatics: data analytical and visualisations tools to gain new understandings from quantitative mass spectrometry

Lead Research Organisation: University of Liverpool
Department Name: Institute of Integrative Biology

Abstract

Life sciences research is increasingly being performed using high-tech instrumentation, producing vast quantities of data. In turn, the science as a whole is being transformed from a knowledge-based discipline, into a Big Data discipline. Some of these techniques are called "omics" - from genomics (studying genes on a large scale), proteomics (proteins) and metabolomics (metabolites). Proteins are the functional molecules in cells, and by studying the levels at which different proteins are present in cells (for example comparing healthy versus diseased cells), we can understand how the system as a whole is behaving (or going wrong) and we can begin to understand the function played by the individual proteins. The pervasive technique used for proteomics is mass spectrometry (MS), which is capable of measuring many thousands of proteins from a single sample.

One of the biggest challenges in omics research is the interplay between often complex and noisy data produced by the instrument and the requirement for bespoke software for data analysis. Both areas are under active development in academic research groups and by industrial organisations (commercial instrument and software manufacturers) - and it is indeed a major challenge ensuring that academic research and development has maximum impact on industry. In proteomics, there are various software packages, both commercial and free/open source, capable of analysing the raw data collected from MS to give a list of proteins identified, along with an abundance value in or between samples of interest. One popular package is called Progenesis QI - marketed internationally by instrument manufacturer Waters. Waters are a large global corporation, with headquarters near Manchester, UK. However, in proteomics, there is currently a shortage of good software for taking the quantified protein list, and performing downstream data analysis to arrive at a real understanding of the biological system, as required by scientists that make use of proteomics techniques in their research. These downstream analyses include visualising the large data set to check the data quality, and starting to understand which groups of proteins may be changing in the system of interest. It is also necessary to perform specialised statistical analyses to ensure that only significant results are taken for further study and published.

In this project, an academic biologist and data analyst (Dr Dean Hammond) will take part in an industry interchange, to work directly with Waters, to develop new software for proteomics data analysis, developing a package called ProteoAnalytics. The package will be able to take input from Progenesis QI (and other suitable upstream packages), enabling biologists easy access to cutting-edge methods for data interpretation (such as mapping proteins to biological pathways), performing specialised statistical analyses, creating unique and powerful visualisations of large data sets, and assisting users to prepare high-quality figures and charts for scientific publications. The software will enable Waters to understand very rapidly the features that work for users, and will assist proteome scientists to perform large scale data analyses. The interchange will allow Dr Hammond to take his knowledge of academic proteomics data analysis to Waters, and gain experience of the industry perspective in software development. The interchange also enables the creation of a collaborative partnership between the principal investigator (Dr Andy Jones, who leads an academic software development group) with the commercial software development team at Waters.

Technical Summary

N/A

Planned Impact

The proposed project has clear direct impact for Waters, as it will provide them with a mechanism for rapid prototyping of new data analysis capabilities, which may in due course be built into their own packages, or commercialised outright. It may also indirectly foster increased sales of instrumentation, tailored to function with, for example, Progenesis QI and ProteoAnalytics.

There is considerable potential in this application for providing indirect benefits to UK public health, quality of life and environmental sustainability, since proteomics is a key technique used across Life Sciences research.


The staff involved will gain significant benefits. First, the interchanger (DEH) will benefit from the opportunity to re-train as an industry grade software engineer and data analyst, as well as making contacts at Waters - a major employer in the biosciences in the North West. The PI (ARJ) will also benefit through strengthened links with Waters - potentially leading to further routes for commercialising a range of research outputs currently being developed in the group with BBSRC funding.
 
Description We have developed a beta version of a software tool (ANALYTICA), for processing omics data (proteomics and metabolomics data), and performing various analysis tasks (visualisation, QC, pathway enrichment). The software is available to the industrial partner, and we applied for follow on funding to develop this into a commercial product. The software is used for some quality control procedures at Waters, as evidenced by the published papers.

The first attempt to raise funding to develop ANALYTICA as a commercial product was not successful, so we are currently exploring alternative options and may release code open source.
Exploitation Route We believe there is potential for the software co-created in this project to become a commercial product but our initial attempts to secure funding have not yet succeeded. There remains a lack of high-quality software for computational biology analysis, which is specifically targetted towards proteomics and metabolomics.
Sectors Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description We have developed software pipelines that are being used internally for monitoring quality control at Industrial Partner Waters.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software),Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
 
Title ANALYTICA 
Description Software for multi-omics informatics analysis and visualisation. The current version is a good "beta" release with a variety of useful functionality. We are now exploring whether we can licence the software, or publish it and release as open source. 
Type Of Technology Software 
Year Produced 2017 
Impact The software is available as a beta release and has been used within Waters for Quality Control procedures.