Systems-biology based approaches for the identification of genes controlling complex phenotypic traits

Lead Research Organisation: King's College London
Department Name: Genetics and Molecular Medicine

Abstract

Next-generation sequencing technologies are revolutionizing the way we do research in molecular biology and genetics. One of the leading companies driving the development of next-generation technologies is Illumina. The cost of DNA sequencing has dropped so much that within the next five years sequencing whole genomes for many individuals will become a standard technique, as being undertaken in the Wellcome Trust Sanger Institute's UK10K project. However, we are just at the very beginning of analysing and understanding these massive amounts of data. This project will develop new bioinformatics methodologies for the analysis of next-generation sequence data. Individuals of one species show many differences in DNA sequence, many variants appear to be without phenotypic effect. But recent publications demonstrated elegantly that analysing the protein-coding sequences in few individuals is sufficient to identify the gene responsible for monogenic traits, for example responsible for particular genetically inherited diseases (Choi 2009; Ng 2009; Ng 2010). In these cases the strong phenotypic effect of the individual sequence variants allowed to exclude all previously known sequence variants from the candidate lists. However, most traits are not determined by single genes, but rather depend on many different genes. Sequence variants contributing to such complex traits will be much harder to identify, because individual variants might not have any phenotypic effects unless they occur in combination with other sequence variants, i.e. we cannot exclude previously known sequence variants per se any longer. Prediction of the deleterious effects of individual sequence variants on the amino-acid sequence of the protein products can provide further evidence for the identification of causal variants, e.g. (Ng 2003), though this approach on its own is not powerful enough to identify the causal gene(s). The aim of this project is to establish a systems approach utilizing biological networks in combination with sequence analysis methods to identify sequence variants in silico that are likely to be important for complex phenotypic traits. The underlying assumption is that multiple sequence variants that hit different proteins involved in functionally related processes will in combination lead to phenotypic effects. This project will use gene networks, protein networks and metabolic networks that we have collected from public data repositories and publications to examine the function and potential impact of sequence variants on the biological system. The approaches developed here will be relevant for the study of biological organisms in general; they will also be very instrumental for the identification of genetic effects contributing to complex phenotypes, which could be relevant for breeding of plants and animals, as well as to improve our understanding of complex diseases such as Crohn's disease, Psoriasis and Cancer. Large-scale sequence data for individuals suffering from these disorders are currently being obtained within the Department and by Illumina and will be available for analysis. The outcomes of this project will benefit researchers in the areas of genetics, bioinformatics, gene and protein networks, systems biology and ultimately disease processes. Bioinformatics software developed as part of this project will be made available free of charge as open source software. Molecular biologists will benefit as users of our software for the analysis of their sequence data and the exploration biological networks; the project will thus support the design of novel experimental approaches. Choi, M. (2009). Proc Natl Acad Sci U S A 106(45): 19096-19101. Ng, P. C. (2003). Nucleic Acids Res 31(13): 3812-3814. Ng, S. B. (2010). Nat Genet 42(1): 30-35. Ng, S. B. (2009). Nature 461(7261): 272-276.

Publications

10 25 50