Bioinformatics for High Throughput Proteomics (Short Course)

Lead Research Organisation: Cranfield University
Department Name: Cranfield Health


The sequencing of the human genome was rightly hailed as a momentous scientific achievement, but it is clear that there are limits to what the genome itself can tell us about how the body works. Perhaps most importantly, the human genome is essentially invariant within a given individual, whereas the protein complement of a cell varies according to the cell type, environmental factors and time. To gain a better insight, it is necessary to study the proteome - the collection of proteins produced by the genome. In recent years, significant investment has been made in laboratory methods capable of identifying proteins in biological samples, such as human tissue and blood. This process of identifying proteins, known as proteomics, used to be time consuming, with only a small number of proteins being identified in a given experiment. However, continued investment has given rise to high throughput proteomics, which allows samples to be processed much faster, and for many more proteins to be identified in a given time. As in previous area of bioanalytical science (e.g. DNA sequencing), the move to high throughput has led to an explosion in the amount of data being produced in proteomics experiments. To fully make use of this data, computational biology solutions (bioinformatics) must be used. This fast moving field is difficult to keep abreast of, so we propose to offer a short course designed to brief proteomics practitioners on the approaches available, and train them in how best to use these approaches to facilitate their research.

Technical Summary

The course will cover the following topics: The data analysis challenges and application of high throughput proteomics. - Genome re-annotation and data integration. - Target identification. - Comparative and quantitative proteomics Strategies for dealing with high throughput proteomic data. - Peptide identification - Protein grouping - Identifying variation (SAPs, PTMs) - Maximising confidence in protein identifications - Data analysis for quantitative proteomic Public standards in proteomics. - Data standards as a facilitator for high throughput proteomics. - PSI standards for proteomic data and protein identification. Current, and in development, systems for large scale proteomic data analysis. - Publicly funded projects (e.g. I-SPIDER, The GAPP, Peptide Atlas). - Commercial offerings (e.g. Mascot Integra). - Building bespoke pipelines.


10 25 50