Proteomics Goes Viral: Novel Resources for Identification and Quantification of Virus Proteins

Lead Research Organisation: Queen Mary, University of London
Department Name: Sch of Biological and Chemical Sciences


Viruses play a significant role in the natural world, often with profound effects for the human population. However, our understanding of the molecular mechanisms of virus infection is being held back by limitations in current data analysis methods. In this project we will produce innovative new software that will provide researchers with a much more accurate picture of what is happening in virus infected cells.

Modern bioanalytical science provides many tools that can be used to help build understanding of biological systems. These tools include high resolution imaging, next generation sequencing, metabolomics and proteomics. Proteomics, which aims to reveal the identity and quantity of proteins in a given sample, is the focus of this proposal. The study of viral infections poses particular challenges because viruses only function within a host organism (e.g. influenza in human) so analysis must be of both the host and virus together. (Indeed, it is the interaction between host and virus where most research interest lies.) Virus studies are further complicated because viruses evolve rapidly, frequently producing new strains with different genomes in an effort to evade the host's immune system. This makes it difficult to study virus proteins because we cannot rely on a given protein being represented by a single consensus sequence as it would be in a higher organism.

There are two main types of proteomics in use today: shotgun proteomics and selected reaction monitoring (SRM). Both of these follow a process in which an enzyme is added to the sample under study to break all the proteins down into more manageable sub-sections (peptides). The sample is then analysed by an instrument called a liquid chromatograph tandem mass spectrometer (LC-MS/MS) which provides large amounts of data that can be used to determine which peptides, and by inference which proteins, were present in the sample. SRM is a targeted technique, where the LC-MS/MS is programmed to look for specific peptides corresponding to proteins of interest, thereby maximising sensitivity. Shotgun proteomics is ostensibly the more open technique in that it considers all proteins that may be present in the sample, but this still requires a finite list of the sequences of all those proteins to be compiled beforehand.

So for a host-virus study, both shotgun proteomics and SRM require prior knowledge of the protein sequences that are likely to be in the sample, from both virus and host. For shotgun proteomics, we could just augment the list of host protein sequences with a list of proteins from all virus strains that might be present in that host. However, with so many different strains to consider, the search space becomes too large and the probability of false positive identifications becomes unacceptable. Similarly, it would be impossible to monitor so many peptides in a single SRM experiment.

This project will produce innovative new software designed specifically to support host-virus proteomics studies. One part of the software will, for the first time, use knowledge of relationships between different virus strains to minimise the search space when processing data from shotgun proteomics, resulting in more reliable and more sensitive protein identification. The other part of the software will use a similar approach to support the design of SRM experiments for monitoring virus proteins within their hosts. These developments will significantly increase the applicability of proteomics to host-virus studies, leading to new biological insights both within this project (we will perform two small experiments, including looking for previously unconfirmed gene products) and beyond.

Technical Summary

Proteomic LC-MS/MS software from companies (e.g. Mascot, Progenesis) and academia (e.g. MaxQuant, Skyline) deals well with data from the eukaryotic single species studies that dominate proteomics, but there remains a significant unmet need in virology. This is because identification and quantitation of proteins is achieved either by mapping peptide spectra to a search database containing all proteins that might be present in the sample (for shotgun proteomics), or by monitoring peptides of specific sequence as surrogates for their parent protein (in selected reaction monitoring: SRM). Neither approach works well for viruses because of their rapid evolution rate - there are many strains of each virus, among which there are many different protein sequences.

This project will develop an innovative set of software resources to tackle this problem. At the core of this effort will be the creation of a non-redundant database of all peptides produced by sequenced viruses known to infect particular hosts. New software tools will be developed that use this database. Firstly, for shotgun proteomics, we will develop and implement a new algorithm that uses a stepwise process that gradually refines the search database to focus on peptides that are found to have a high probability of being present in the sample. This will be complemented by a new protein inference algorithm that uniquely uses information about the relationship between virus strains to map peptides to proteins. Finally, we will use the peptide database to produce SRM transitions that will be incorporated into our existing web-based MRMaid experiment design tool. This will allow researchers to design SRM experiments that target specific virus proteins in the presence of host proteins.

We will evaluate our software in the lab; in a SILAC quantitative proteomics analysis of cells infected with two viruses simultaneously, and to design an SRM experiment targeting known virus peptides and potential novel ORFs.

Planned Impact

As a fundamental methodology that substantially improves our ability to study virus proteins, the potential beneficiaries of the new research facilitated by the proposed new software are broad and numerous. Viruses are well known as the cause of many human diseases (e.g. the common cold, influenza, dengue fever, HIV) and it seems inevitable that a virus will be responsible for a future global pandemic. Viruses also affect plants and animals, posing a risk to food security by reducing yields - Cassava Brown Streak Disease (CBSD) is a recent example affecting one of Africa's staple food sources. On a more positive note, viruses are also used in gene therapy, where they are being coaxed into carrying genetic material into the body to correct defective genes.

Understanding how viruses function is clearly of great importance, so the improved understanding that we will facilitate has great potential to impact on human health, animal welfare, food security, public policy and the economy.

This proposal will also help bolster the UK's position in proteomics research. Despite proteomics being a very competitive area globally, BBSRC funding has helped the UK to establish several internationally competitive research groups, both in laboratory proteomics and proteome informatics. This has led to commercial activities, including the formation of the very successful proteomics software companies Matrix Science and Nonlinear Dynamics (recently purchased by Waters). With continued investment we see no reason why the UK cannot retain its high standing in proteomics. This particular proposal will ensure continued support for a group of UK-based researchers who are establishing themselves as world leaders in host-pathogen proteomics.

In terms of timescale, we genuinely expect some benefits of this project to be realised within the project itself as we have built in two small but significant experimental studies - one looking at cells infected with multiple viruses simultaneously and one seeking to confirm the existence of novel ORFs for which we have tentative evidence from previous experiments. Scientific benefits will extend further as the software produced is made generally available, and any societal benefits that follow from novel scientific insights would become apparent in subsequent years.


10 25 50
Description We have discovered a new, more sensitive, method for detecting virus protein within host cells. The method has been proven to work on samples known to be infected in a virus.

Tryptic peptide signatures of viruses for three species (human, mosquito and flying fox bat) have been made available on our in-house Galaxy instance. See for details.
Exploitation Route Our method is still in the process of being prepared for publication and made available as free software, allowing it to be used and adapted by others. This has not been done yet because we have not identified a biological application exciting enough to warrant publication in a high impact journal. Since the grant finished, we have spent a considerable amount of time and effort reanalyising data from the public proteomics data repository, PRIDE, in the hope of finding disease-related virus infections but we have not found any signficant results to date.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology