Using bioinformatics and microarrays to study the regulation of families of alternative splices
Lead Research Organisation:
University of Essex
Department Name: Biological Sciences
Abstract
Although there appear to be 30,000 genes within our DNA, it has recently become clear that these genes produce almost 100,000 proteins. As a typical gene has the ability to produce several proteins, it is vital that we identify under what conditions genes make their choice of protein. We believe that the choices are made at the molecular level, and depend on the abundance of certain classes of other proteins. However, the choices that genes make needs to be tightly controlled, as otherwise disease will result. Furthermore, the choices that genes make will need to be in conjunction with choices made by other genes, and so we need to identify how this is coordinated across all 30,000 genes. In order to approach this issue, large amounts of data are required in order to establish which genes are switched on and off in particular combinations. Fortunately this is now possible using a new technology called a GeneChip. Many scientists have created catalogues of GeneChips and they provide an ideal library in which to search for patterns in how genes behave. As with many technologies, there are a number of improvements that can be made to GeneChip data, and this project is focussed on cleaning up this data. This project aims to look through the back catalogue of GeneChip experiments, as well as other catalogues about genes and how they behave, in order to establish which parts of the GeneChip appear to behaving in an unexpected manner. Finding these parts will allow us to establish the causes for this discrepancy and thus allow us to correct for the problems apparent in the data. We will then be able to look through the cleaned up catalogue of GeneChip data to discover the coherent patterns of genes being switched on and off together, and the choices of proteins that the genes make. The complexity of the problem, as well as the sheer size of these libraries, results in the need for large computing power as well as the development of novel mathematical and statistical techniques. This exciting scientific area is called bioinformatics, and it is leading to new insights about the molecular causes behind many diseases. In the shorter term, this research is expected to provide insights into the regulation of our DNA, and in the longer term may lead to novel drug targets.
Technical Summary
Gene expression is tightly regulated at the post-transcriptional level, controlling the fate of the transcribed RNA molecules, including their stability, the efficiency of their translation, and their sub-cellular localization. Moreover, several of the steps involved in post-transcriptional processing, such as splicing, can cause a single transcriptional unit to generate a diverse set of protein isoforms. Indeed, such alternative splicing is the most readily understandable way in which 30,000 genes can result in a proteome several times larger in Eukaryotes. Affymetrix GeneChips are widely used to observe RNA on a genomic scale. But due to the use of multiple probes representing one gene, it is vital that problem probes are identified and removed from the analysis of this type of data. This project will focus on finding these spurious probes, and on establishing how to correct for the effects of this systematic noise. Estimating the relative abundance of each splice variant in a sample of mRNA is important for the study of the underlying biological function. However, the design and analysis of most GeneChip experiments to date have not accounted for splice variants. Due to the prevalence of alternative splicing this is a major shortcoming and there is now an urgent need to develop splice-variant specific analysis algorithms in order to perform accurate gene expression profiling. This project aims to develop such methods. By integrating the bioinformatics of transcriptional regulation with observations of the transcriptome using microarrays, we expect to identify how a genome is able to coordinate its choice of transcripts under different biological conditions.
People |
ORCID iD |
Andrew Harrison (Principal Investigator) |
Publications
Memon FN
(2010)
A Comparative Study of the Impact of G-Stack Probes on Various Affymetrix GeneChips of Mammalia.
in Journal of nucleic acids
Wang X
(2021)
A general principle for spontaneous genetic symmetry breaking and pattern formation within cell populations.
in Journal of theoretical biology
Langdon WB
(2010)
A survey of spatial defects in Homo Sapiens Affymetrix GeneChips.
in IEEE/ACM transactions on computational biology and bioinformatics
Arteaga-Salas JM
(2008)
An overview of image-processing methods for Affymetrix GeneChips.
in Briefings in bioinformatics
Harrison AP
(2007)
Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips.
in BMC bioinformatics
Langdon W
(2009)
Evolving DNA motifs to predict GeneChip probe performance.
in Algorithms for molecular biology : AMB
Upton GJ
(2008)
G-spots cause incorrect expression measurement in Affymetrix microarrays.
in BMC genomics
Memon FN
(2010)
Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.
in Journal of integrative bioinformatics
Stalteri MA
(2007)
Interpretation of multiple probe sets mapping to the same gene in Affymetrix GeneChips.
in BMC bioinformatics
Description | We identified that the widely used GeneChip technology for measuring gene expression has several biases. This potentially impacts upon the results of thousands of scientific papers. |
Exploitation Route | By correcting for the biases we discovered, biologists will produce cleaner data that will aid their interpretation. |
Sectors | Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology |
Description | The findings any biases have been used to improve the analysis in about 40 papers by researchers. |
First Year Of Impact | 2008 |
Sector | Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology |
Description | Bioinformatics |
Organisation | Zhejiang University |
Country | China |
Sector | Academic/University |
PI Contribution | I collaborate with Professor Ming Chen. We have visited each other's laboratory, and I have hosted several of his students. |
Collaborator Contribution | They have produced several papers in which I was invited to be a co-author. |
Impact | The collaboration has led to several workshops in China, on "Chips, Computers and Crops". |
Start Year | 2008 |