Using bioinformatics and microarrays to study the regulation of families of alternative splices

Lead Research Organisation: University of Essex
Department Name: Biological Sciences

Abstract

Although there appear to be 30,000 genes within our DNA, it has recently become clear that these genes produce almost 100,000 proteins. As a typical gene has the ability to produce several proteins, it is vital that we identify under what conditions genes make their choice of protein. We believe that the choices are made at the molecular level, and depend on the abundance of certain classes of other proteins. However, the choices that genes make needs to be tightly controlled, as otherwise disease will result. Furthermore, the choices that genes make will need to be in conjunction with choices made by other genes, and so we need to identify how this is coordinated across all 30,000 genes. In order to approach this issue, large amounts of data are required in order to establish which genes are switched on and off in particular combinations. Fortunately this is now possible using a new technology called a GeneChip. Many scientists have created catalogues of GeneChips and they provide an ideal library in which to search for patterns in how genes behave. As with many technologies, there are a number of improvements that can be made to GeneChip data, and this project is focussed on cleaning up this data. This project aims to look through the back catalogue of GeneChip experiments, as well as other catalogues about genes and how they behave, in order to establish which parts of the GeneChip appear to behaving in an unexpected manner. Finding these parts will allow us to establish the causes for this discrepancy and thus allow us to correct for the problems apparent in the data. We will then be able to look through the cleaned up catalogue of GeneChip data to discover the coherent patterns of genes being switched on and off together, and the choices of proteins that the genes make. The complexity of the problem, as well as the sheer size of these libraries, results in the need for large computing power as well as the development of novel mathematical and statistical techniques. This exciting scientific area is called bioinformatics, and it is leading to new insights about the molecular causes behind many diseases. In the shorter term, this research is expected to provide insights into the regulation of our DNA, and in the longer term may lead to novel drug targets.

Technical Summary

Gene expression is tightly regulated at the post-transcriptional level, controlling the fate of the transcribed RNA molecules, including their stability, the efficiency of their translation, and their sub-cellular localization. Moreover, several of the steps involved in post-transcriptional processing, such as splicing, can cause a single transcriptional unit to generate a diverse set of protein isoforms. Indeed, such alternative splicing is the most readily understandable way in which 30,000 genes can result in a proteome several times larger in Eukaryotes. Affymetrix GeneChips are widely used to observe RNA on a genomic scale. But due to the use of multiple probes representing one gene, it is vital that problem probes are identified and removed from the analysis of this type of data. This project will focus on finding these spurious probes, and on establishing how to correct for the effects of this systematic noise. Estimating the relative abundance of each splice variant in a sample of mRNA is important for the study of the underlying biological function. However, the design and analysis of most GeneChip experiments to date have not accounted for splice variants. Due to the prevalence of alternative splicing this is a major shortcoming and there is now an urgent need to develop splice-variant specific analysis algorithms in order to perform accurate gene expression profiling. This project aims to develop such methods. By integrating the bioinformatics of transcriptional regulation with observations of the transcriptome using microarrays, we expect to identify how a genome is able to coordinate its choice of transcripts under different biological conditions.

Publications

10 25 50
publication icon
Arteaga-Salas JM (2008) An overview of image-processing methods for Affymetrix GeneChips. in Briefings in bioinformatics

publication icon
Harrison AP (2008) The use of Affymetrix GeneChips as a tool for studying alternative forms of RNA. in Biochemical Society transactions

publication icon
Langdon W (2009) Evolving DNA motifs to predict GeneChip probe performance. in Algorithms for molecular biology : AMB

publication icon
Langdon WB (2010) A survey of spatial defects in Homo Sapiens Affymetrix GeneChips. in IEEE/ACM transactions on computational biology and bioinformatics

publication icon
Memon FN (2010) Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing. in Journal of integrative bioinformatics

 
Description We identified that the widely used GeneChip technology for measuring gene expression has several biases. This potentially impacts upon the results of thousands of scientific papers.
Exploitation Route By correcting for the biases we discovered, biologists will produce cleaner data that will aid their interpretation.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description The findings any biases have been used to improve the analysis in about 40 papers by researchers.
First Year Of Impact 2008
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology
 
Description Bioinformatics 
Organisation Zhejiang University
Country China 
Sector Academic/University 
PI Contribution I collaborate with Professor Ming Chen. We have visited each other's laboratory, and I have hosted several of his students.
Collaborator Contribution They have produced several papers in which I was invited to be a co-author.
Impact The collaboration has led to several workshops in China, on "Chips, Computers and Crops".
Start Year 2008