Statistical bioinformatics and genetics

Lead Research Organisation: MRC Biostatistics Unit

Abstract

New high throughput experimental methods, for example sequencing of whole genomes, genomewide gene expression profiles, or comprehensive collections of genetic markers, have revolutionised biological and medical research. Demand for proper statistical methods and tools to cope with such challenging data is high but has not yet been fully addressed by the statistical community. There might be two reasons for this. First, even preliminary analysis of large scale experimental data requires a thorough understanding of the underlying biological and molecular principles. Second, the interpretation of results itself is a major statistical, mathematical modelling and inference task, since complex models of the underlying biological processes are required in order to understand, predict, and ultimately manipulate the biological system in question. High noise levels of high throughput experiments pose a further challenge and require the integration of several heterogeneous information sources and experimental data.Our aim is to develop statistical analysis techniques that meet these challenging aspects of modern biology and medicine. The models envisaged in our analysis are not confined to traditional statistical models but comprise also mathematical and computational models which are able to represent the complex dynamics in real biological systems and the interactions among their various components.

Technical Summary

Building on our experience in the area of statistical bioinformatics and genetics, our aim is to continue the development of tools for the analysis of genomic and genetic data in conjunction with cell biological and physiological high throughput data. We propose to focus on gene regulation and biopathways as derived from genetic, phylogenetic, gene expression, and molecular biomarker data. Biological systems of particular interest are stem and cancer cells (human, mouse, and drosophila), blood cells (platelets involved in artherothrombosis), bacteria (Mycobacterium tuberculosis), and plants (Arabidopsis thaliana). We will extend methods which are successful in the inference of models for cellular regulation to intercellular regulation, in particular to the immune response to parasite infection (Schistosoma mansoni). Statistical modelling of such systems is challenging. Experimental data as well as other sources of information, such as bioinformatics databases, are quite comprehensive and have special storage, normalisation and preliminary analysis requirements. Statistical and mathematical models which are able to represent key features of a biological system, features important for its understanding, prediction, and manipulation, are quite complex. We will explore how to combine statistical inference methods, machine learning algorithms, and mathematical modelling to derive useful representations of the biological systems of interest. Genetics research comprises a distinct sub-programme within this proposal with strong links to epidemiological studies and hence to primary clinical research. Recent technologies allow genetic association studies on very large scales, for which new analysis methods are urgently needed. We will develop methods suitable for whole genome association scans that are sensitive to the presence of multiple interacting genes while respecting the multiple testing problems that arise. We will exploit the near-linear arrangement of genes on chromosomes to develop multipoint mapping methods giving improved localisation of genes in association scans. We will improve methodology for genetic epidemiology by applying recent ideas from likelihood theory to existing regression models. We will anticipate new technologies for high throughput whole genome sequencing, by developing methods for direct analysis of DNA sequences, as opposed to genetic markers. Integrating evidence from multiple data sources from different levels of organisation of a biological system will enable the discovery of important functional links and the assessment of the predictive import of molecular biomarkers with respect to phenotypes of direct clinical interest.Development of application-specific methods will result in the creation of generic computational tools and software for use by a larger community of bioinformaticians and biologists, not necessarily experts in the detailed statistical background.

Publications

10 25 50
publication icon
Syed Z (2007) An investigation of the neurotrophic factor genes GDNF, NGF, and NT3 in susceptibility to ADHD. in American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics

publication icon
Goudie RJB (2019) Joining and splitting models with Markov melding. in Bayesian analysis

publication icon
Evangelou M (2014) Two novel pathway analysis methods based on a hierarchical model. in Bioinformatics (Oxford, England)

publication icon
Reid JE (2016) Pseudotime estimation: deconfounding single cell time series. in Bioinformatics (Oxford, England)

publication icon
Koohy H (2010) An alignment-free model for comparison of regulatory sequences. in Bioinformatics (Oxford, England)

 
Description EC FP7 funding
Amount £211,000 (GBP)
Organisation European Commission 
Department Seventh Framework Programme (FP7)
Sector Public
Country European Union (EU)
Start 10/2008 
End 03/2012
 
Description FNIH Genetic Association Information Network (GAIN)
Amount £160,000 (GBP)
Organisation Foundation for the National Institutes of Health (FNIH) 
Sector Charity/Non Profit
Country United States
Start 06/2007 
End 05/2008
 
Description Marie Curie Studentship
Amount £140,500 (GBP)
Organisation Marie Sklodowska-Curie Actions 
Sector Charity/Non Profit
Country Global
Start 10/2008 
End 09/2011
 
Description NIH-Genomewide association study of schizophrenia
Amount £259,046 (GBP)
Organisation National Institutes of Health (NIH) 
Sector Public
Country United States
Start 06/2008 
End 05/2010
 
Description Wellcome Trust Programme Grant (Functional genomics of neuronal identity)
Amount £715,111 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start  
End 09/2007
 
Title Boolean network inference 
Description Python library for inference of boolean networks from qualitative gene activity information 
Type Of Material Data analysis technique 
Year Produced 2010 
Provided To Others? Yes  
Impact Downloaded by research groups to make inference of boolean gene networks 
 
Title Gapped motif finder 
Description Retrieves transcription factor binding motifs. Tool available on the internet 
Type Of Material Improvements to research infrastructure 
Year Produced 2009 
Provided To Others? Yes  
Impact not yet 
 
Title Graphical analysis of epidemiological data 
Description Web interface for analysis and geographical display of epidemiological data 
Type Of Material Data analysis technique 
Year Produced 2010 
Provided To Others? Yes  
Impact Ongoing work PMID: 21457547 
URL http://europepmc.org/abstract/MED/21457547
 
Title Python library 
Description Python library for Gaussian process regression that is available through the sysbio website. See http://www.sys-bio.org/ 
Type Of Material Data analysis technique 
Year Produced 2009 
Provided To Others? Yes  
Impact Research material widely available for others research groups 
URL http://www.sys-bio.org/
 
Title Rwui 
Description automated statistical webservice generator 
Type Of Material Improvements to research infrastructure 
Year Produced 2007 
Provided To Others? Yes  
Impact Visiting worker who used this to develop a epidemiological analysis tool 
 
Title STEME 
Description A novel sequence motif finder suitable for large sequence data sets emerging from new sequencing techniques 
Type Of Material Data analysis technique 
Year Produced 2011 
Provided To Others? Yes  
Impact Papers published: PMID: 21785132; PMID: 21047506; PMID: 20696736 Software downloads 
URL http://europepmc.org/abstract/MED/21785132
 
Title Serotype classifier 
Description Statistical software for analysis of serotyping microarrays, available via a web interface 
Type Of Material Data analysis technique 
Year Produced 2008 
Provided To Others? Yes  
Impact Routine analysis possible in experimental labs by using web interface 
 
Title UNPHASED 
Description A software package UNPHASED has been developed within this programme and released under GPL licensed. It performes a range of statistical analyses for genetic data including novel approaches to missing data in family based designs. 
Type Of Material Improvements to research infrastructure 
Year Produced 2006 
Provided To Others? Yes  
Impact Successive versions of the software have been referenced in over 800 publications since 2003 
 
Description Analysis of Serotypes 
Organisation St George's University of London
Department Bacterial Microarray Group at St George's (BµG@S)
Country United Kingdom 
Sector Academic/University 
PI Contribution Statistical analysis and analysis software
Collaborator Contribution Experimental data providers
Impact PMID: 21453458 Negotiating commercialization of software
Start Year 2007
 
Description Biomarkers for prostate cancer 
Organisation Medical Research Council (MRC)
Department MRC Cancer Cell Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution Statistical analysis of expression data, biomarker prediction
Collaborator Contribution Data sharing
Impact Predictions are currently tested experimentally
Start Year 2011
 
Description Bloodomics 
Organisation University of Cambridge
Department Department of Haematology
Country United Kingdom 
Sector Academic/University 
PI Contribution We performed study design (sample size and selection of subjects) and statistical analysis of data and interpretation of results. We contributed to manuscript writing.
Collaborator Contribution Our role in these collaborations was to provide statistical advice and analysis for projects aiming to identify disease genes. In addition to the intrinsic value of these projects, they provided motivating problems for new statistical projects within our programme.
Impact PMID: 16595075; PMID: 16706959; PMID: 17192395; PMID: 17499550; PMID: 17663743; PMID: 18569861; PMID: 19109564; PMID: 19228925; PMID: 19429868; PMID: 21738480; PMID: 21765411; PMID: 21738486 Publication in press (PLoS Genetics, 2011)
 
Description Expression analysis of M. tuberculosis 
Organisation Public Health England
Department Centre for Emergency Preparedness and Response Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution Computational and statistical analysis
Collaborator Contribution Providing data and research input.
Impact PMID: 20356371; PMID: 20199667
 
Description Genetics of hematopoiesis 
Organisation The Wellcome Trust Sanger Institute
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Functional bioinformatics analysis
Collaborator Contribution data sharing
Impact Publication in press (Nature, 2011)
Start Year 2010
 
Description Immune response to Schistosomiasis 
Organisation University of Cambridge
Department Department of Pathology
Country United Kingdom 
Sector Academic/University 
PI Contribution Experimental data
Collaborator Contribution Experimental data provider
Impact Manuscripts in preparation
Start Year 2008
 
Description Molecular Genetics of Schizophrenia 
Organisation Stanford University
Country United States 
Sector Academic/University 
PI Contribution We planned the statistical analysis, contributed to grant writing, contributed to statistical analysis of data and interpretation of results. We contributed to manuscript writing.
Collaborator Contribution Our role in these collaborations was to provide statistical advice and analysis for projects aiming to identify disease genes. In addition to the intrinsic value of these projects, they provided motivating problems for new statistical projects within our programme.Our role in these collaborations was to provide statistical advice and analysis for projects aiming to identify disease genes. In addition to the intrinsic value of these projects, they provided motivating problems for new statistical projects within our programme.
Impact PMID: 16685665; PMID: 19571809
Start Year 2006
 
Description Molecular Genetics of Schizophrenia 
Organisation University of Chicago
Country United States 
Sector Academic/University 
PI Contribution We planned the statistical analysis, contributed to grant writing, contributed to statistical analysis of data and interpretation of results. We contributed to manuscript writing.
Collaborator Contribution Our role in these collaborations was to provide statistical advice and analysis for projects aiming to identify disease genes. In addition to the intrinsic value of these projects, they provided motivating problems for new statistical projects within our programme.Our role in these collaborations was to provide statistical advice and analysis for projects aiming to identify disease genes. In addition to the intrinsic value of these projects, they provided motivating problems for new statistical projects within our programme.
Impact PMID: 16685665; PMID: 19571809
Start Year 2006
 
Description Neuronal development in Drosophila 
Organisation University of Cambridge
Department Department of Physiology, Development and Neuroscience
Country United Kingdom 
Sector Academic/University 
PI Contribution Statistical analysis, mathematical modelling, software
Collaborator Contribution Experimental data provider
Impact PMID: 21785132; PMID: 21047506; PMID: 20696736 Software
Start Year 2009
 
Description Oesophageal cancer 
Organisation Medical Research Council (MRC)
Department MRC Cancer Cell Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution Statistical analysis, analysis software
Collaborator Contribution Experimental data provider
Impact Manuscripts in preparation, statistical software
Start Year 2009
 
Description 27th International Biometric Conference, Florence, Italy 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Health professionals
Results and Impact EG, from LW's research team presented a talk 'Bayesian multi-task clustering of gene expression time series data from multiple experiments' under the Contributed section - microarray and omics Data, Florence, 7th July


This event facilitated communication, interaction and collaboration between scientists.
Raised the profile of the BSU and strenghted its links with international institutions.
The Unit strengthened its reputation as a major centre for knowledge transfer.
Attendees' reactions to the talk were very positive.
Year(s) Of Engagement Activity 2014
URL http://www.ibs-italy.info/IBC_scientific%20programme.pdf
 
Description Armitage Lecture and Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact More than 100 participants attended this annual event during which an eminent medical statistician visits for a week and works with members of the Unit. The highlight is the delivery of the Armitage Lecture which is free and open to other health related professionals.
Talk: "Inference of regulation in biological systems". Professor Lorenz Wernisch (7th November 2013)

Raised the profile of the Unit and strenghted its links with international institutions.
BSU strengthened its reputation as a major centre for knowledge transfer.
Attendees' reactions to the lectureships and worshops have been overwhelmingly positive.
Year(s) Of Engagement Activity 2012,2013,2014
URL http://www.mrc-bsu.cam.ac.uk/training/workshops/armitage-lectureships-and-workshops/
 
Description Armitage Lectures 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Annual workshop and lecture created and hosted by the MRC Biostatistics Unit, to honour the immense contributions of Professor Peter Armitage who was at the unit from 1947 to 1961, and whose work is recognised throughout the world as achieving a successful balance between methodological rigour and applied commonsense, to which all statisticians aspire. An eminent medical statistician visits for a week and works with members of the unit. The highlight is the Armitage Lecture, where more than 100 delegates attend. This event raises the unit research profile and creates new collaborations.
Year(s) Of Engagement Activity 2012,2013,2014,2015,2016,2017
URL https://www.mrc-bsu.cam.ac.uk/news-and-events/armitage-lectureships-and-workshops/
 
Description BSU Open Day 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Unit held open day as part of MRC Festival of Medical Research.

The aim of the open day was to welcome secondary school students and members of the general public to come to the unit, find out about the research the unit does, and to take part in activities that illustrate BSU research, with the overall theme being 'Fun with statistics'. An open day of this format was a first for the unit and overall it was a very successful event. There were 40 attendees over a 4 hour event. All attendees pre-booked and were split into 4 groups for a 1 hour session comprising of an introduction, participation in hands-on activities, and a brief careers talk. The small groups and length of session allowed for quality engagement between the scientists and the audience.

Feedback from the attendees was very positive, and the wider MRC Festival activities that took place in Cambridge demonstrated the benefits in delivering these types of events.
Year(s) Of Engagement Activity 2016
URL http://www.mrc-bsu.cam.ac.uk/bsu-open-day-2016-why-are-statistics-important/
 
Description Cambridge Science Festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? Yes
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Each year BSU participate in Cambridge Science Festival - members of the general public explore and discuss issues of scientific interest and concern, through a series of different events. The event also aims to raise aspirations by encouraging young people to consider a career in science, technology, engineering or mathematics.

BSU take part over two full days - 'Science Saturday' and the 'Cambridge Biomedical Campus' day. The unit presents a stand with 4 - 5 interactive activities that each communicate a basic statistical method or idea, representing one of the four research themes in the unit. Each year a new activity is developed and delivered requiring scientific input from staff across the unit. Over the two days, BSU engage with approximately 500 adults and children who visit the festival.

Raised awareness for the work of the Unit in the local schools and community
Increase in request for further information
Audience asked for more opportunities for communication and interaction with the public health researchers
This event contributed to raise the profile of Biostatistics in medical research
This event contributed to enhance the methodological quality of medical research developed by BSU staff
This event contributed to enable Best Research for Best Health
Year(s) Of Engagement Activity 2007,2008,2009,2011,2012,2013,2014,2015,2016,2017
URL http://www.cam.ac.uk/science-festival
 
Description LSHTM's 23rd Austin Bradford Hill Memorial Lecture, and Clinical Trials: Past, Present and Glorious Future meeting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? Yes
Geographic Reach National
Primary Audience Health professionals
Results and Impact For the LSHTM's 23rd Austin Bradford Hill Memorial Lecture, and Clinical Trials: Past, Present and Glorious Future meeting, more than two hundred guests gathered at the John Snow Lecture Theatre, London School of Hygiene and Tropical Medicine (LSHTM), and dedicated an afternoon to celebrating the history, achievements and challenges of randomized controlled trials, as part of the joint-history of the MRC Biostatistics Unit and Professors of Medical Statistics at the LSHTM.

Raising the profile of the Unit nationally.
Facilitated communication, interaction and collaboration between scientists.
Contributed to enhance the methodological quality of medical research developed by BSU staff.
Contributed to enable Best Research for Best Health.
The London School of Hygiene and Tropical Medicine filmed various of the talks and published the videos online through the LSHTM Plus Vimeo channel. See http://vimeo.com/channels/760568
Year(s) Of Engagement Activity 2014
URL http://www.mrc-bsu.cam.ac.uk/bradford-hill-memorial-lecture/
 
Description MPhil lectures 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Health professionals
Results and Impact 6 lectures on network modelling for MPhil Computational Biology students, Cambridge university, spring 2008

Recognition for the MRC and the unit
Year(s) Of Engagement Activity 2008,2009
 
Description MRC Centenary Open Week (BSU Open Day) 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact 50 + people attended an exhibition about the history, pioneers and discoveries of the MRC Biostatistics Unit, and a series of talks/discussions series of talks and lectures as part of the BSU contribution to MRC Centenary Open Day
Talk: 'Making sense of genomes', Professor Lorenz Wernisch (20th June 2013)

Raising the profile of the Unit in the local region
Increase in request for further information
Audience asked for more opportunities for communication and interaction with the public health researchers
Contributed to enhance the methodological quality of medical research developed by BSU staff
Contributed to enable Best Research for Best Health
Year(s) Of Engagement Activity 2013
URL http://www.mrc-bsu.cam.ac.uk/mrc-bsu-successfully-joined-medical-research-council-open-week/
 
Description Probabilistic modelling of metabolic regulation in prokaryotes 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Type Of Presentation Keynote/Invited Speaker
Geographic Reach Regional
Primary Audience Health professionals
Results and Impact Isaac Newton Institute, Cambridge, "Probabilistic modelling of metabolic regulation in prokaryotes"

Recognition for the MRC and the unit.
Year(s) Of Engagement Activity 2007
 
Description SMPGD 2017: Statistical Methods for Postgenomic Data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Scientific talk
Year(s) Of Engagement Activity 2009,2010,2011,2012,2013,2014,2015,2016,2017
URL https://smpgd2017.wordpress.com/
 
Description Schizophrenia press release 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact A press release was issued with the publication in Nature of a paper describing the discovery of a genetic basis for schizophrenia.

We were interviewed by the Guardian, but they did not publish the interview.
Year(s) Of Engagement Activity 2009
 
Description Statistics and public health talk at Emmanuel College - 'Rowing, statistics and my genes - A cautionary tale' 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Talk at Emmanuel College on 'Rowing, statistics and my genes - A cautionary tale' as part of the 'Statistics Meets the Public's Health' seminar series. A non-technical talk illustrating how statistics helps to find answers to a range of public health related questions.

30+ audience members - ranging from academics, students and members of the general public.

Due to the success of the seminar series, future upcoming talks have been organised.
Year(s) Of Engagement Activity 2016
 
Description Stochastic modelling for stem cells 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Health professionals
Results and Impact Summer School on Systems Biology, Tenerife, "Stochastic modelling for stem cells"

Recognition for the MRC and the unit
Year(s) Of Engagement Activity 2008
 
Description UCLID 2016 Understanding Complex and Large Industrial Data Lancaster University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Scientific presentation
Year(s) Of Engagement Activity 2016,2017
URL http://www.lancaster.ac.uk/uclid2016/
 
Description Wittgenstein Symposium 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact 150 participants attended general talk on philosophical implications of statistical modelling with questions and discussions

N/A
Year(s) Of Engagement Activity 2010