University of Oxford Big Data Institute: Development & dissemination of efficient analysis methods for large, complex, heterogeneous clinical datasets

Lead Research Organisation: University of Oxford
Department Name: Wellcome Trust Centre for Human Genetics

Abstract

Background: Understanding the determinants of common life-threatening and disabling disease is challenging. Such conditions are typically caused by many different factors (genetic, physiological, behavioural, infectious and environmental), which may be occur at different times of life (including developmental factors as well as those that may occur much later; e.g. cigarette smoking, alcohol, co-morbidity, and treatment). Furthermore the effects may manifest over a range of timescales (immediate to many decades) and time courses (acute, chronic stable, progressive, relapsing/remitting).

Vision: The University of Oxford Big Data Institute (BDI) sets out to be an international centre of excellence for the analysis of very large and complex biomedical data sets. The Institute will combine world-class academic leadership with a core infrastructure of cutting-edge technology and high calibre scientists. Built upon Oxford's internationally leading expertise in epidemiology, genomics, imaging, computer science, and infectious disease surveillance, the BDI will be a national resource for the development of new analytical methods to facilitate the generation, storage, analysis and sharing of data in very large clinical studies. This work will provide transformative advances in the speed and range of research into the causes and consequences, prevention and treatment of disease, and will be particularly relevant to understanding the role of genetic and environmental influences on common life-threatening and disabling diseases such as cancer, cardiac disease, stroke, and dementia.

Knowledge Transfer: In addition to primary research, the BDI will play a major role in capacity building through training, dissemination of research methods, and stakeholder engagement:
- A new MRC Big Data Training Academy will deliver an extensive, flexible and outward-looking portfolio of training and career development opportunities accessible to researchers of all levels within and outside Oxford, including a new doctoral training programme, support for post-doctoral training fellows, visitor and exchange programmes, and a broad range of short courses suitable for graduate training and continuing professional development.
- Effective collaboration with local, national and international experts (including partnerships with academia, healthcare, and pharmaceutical and IT industries) will enhance the development, evaluation and adoption of new research methods and tools.
- The Institute will work with regulators and other key stakeholders to develop standards for data storage, sharing and analysis; and to promote appropriate and proportionate regulatory and governance approaches that are fit-for-purpose in areas such as privacy, consent, information security, data access and sharing, intellectual property, and the intersection between research and routine practice.
- Public engagement activities will promote understanding, address concerns, and develop trust in "Big Data" approaches to biomedical research.

Technical Summary

Advances in Big Data offer new ways to conduct health research, with advantages in speed, cost and scope of scientific enquiry. In particular, major advances will come from the development of new methods to integrate and interrogate multiple, complementary datasets simultaneously. The University of Oxford Big Data Institute (BDI) will develop flexible and efficient systems, tools and methods for generating and analysing research-optimized data sets that incorporate relevant clinical information (including bespoke research data and routine clinical information), extensive phenotypic measurements (including imaging, physical and function assessments) and genomic and other laboratory data.

Evaluation of clinical phenotype: The BDI will bring together bioinformaticians, clinical specialists and research users to develop validated algorithms that combine multiple data sources to confirm, clarify and classify clinically and biologically relevant phenotypes in very large studies.

Interpretation of genotype: Current statistical methodologies for assessment of genetic influences on disease (many of which were developed in Oxford) fail in regions of high structural or sequence diversity (e.g. around human HLA and KIR loci, and in many bacterial and eukaryotic pathogens). The BDI will develop novel methods that integrate information from multiple reference sequences, thus extending the ability for researchers to investigate genetic influences on disease pathways.

Combining multi-dimensional phenotypic and genomic data: Building on our experience in clinical informatics and genetic analyses, we will address gaps in the current standards and ontologies. We will develop appropriate statistical methods for the analysis of high-dimensional and highly correlated data in order to allow biologically meaningful associations to be identified robustly. We will also establish an infrastructure that supports appropriate sharing and analysis of biomedical Big Data.

Planned Impact

The University of Oxford Big Data Institute (BDI) seeks to promote Innovation in Medical Science to produce Transformation in Human Health.

The Institute aims to have broad-ranging influence:

Improved methods and technologies for research: The BDI will bring together existing expertise in large-scale epidemiology, imaging, genomic medicine, bioinformatics, and computer science to create an internationally leading centre of excellence for the analysis of large, complex, heterogeneous data sets for research into the causes and consequences, prevention and treatment of disease. The work of the Institute has the potential to radically extend the boundaries of these individual disciplines (e.g. large-scale epidemiology will require new tools and methodologies to analyse even larger, more complex and rich heterogeneous data), but, the greatest impact is likely to be achieved by close collaboration and working between disparate disciplines (e.g. computer science and epidemiology to develop new analytical tools for routine health data; clinical, imaging and genetic science to better understand the development of dementia). The creation of the Institute will act as a focal point for fostering collaborative engagement with both academic and industrial partners, nationally and internationally.

New advances in science and health: The expertise, methods and systems developed will provide big advances in the ability to understand disease mechanisms and discover new treatment approaches, and will yield significant benefits for clinical care in areas such as infectious disease surveillance & management, and the translation of genomics into routine medical practice. We intend to make our data, methods & tools widely available for scientific researchers.

Capacity building: A new MRC Big Data Training Academy will provide an extensive portfolio of training and development opportunities, suitable for scientists at all stages of their career, and readily accessible to external researchers. These will include a new doctoral training programme in Big Data for Biomedicine, post-doctoral fellowships, a visitor and exchange programme, a large number of short courses suitable for both graduate training and continuing professional development. Much of the training material will be available through a remote learning environment, and the annual Oxford-Stanford Conference on Big Data will provide an opportunity for cross-fertilisation of ideas among the many stakeholders in this research. Thus the multidisciplinary environment of the new Institute will provide substantial opportunities for the development of a new generation of research scientists, equipped to tackle the major medical bioinformatic challenges of Big Data research for the improvement of human health.

Commercial and economic impact: The work seeks to generate new methods, tools and systems for analysing Big Data in biomedicine. These tools will be made widely available through commercial entities spun out of the University, engagement with large IT companies such as Oracle, Microsoft and Google; and strong relationships with pharma who, like academia, are dependent on this type of large sale analysis. We anticipate that the Institute's work will generate substantial investment into UK bioinformatics and health research, and opportunities for spinouts.

Societal impact: The Institute will create a community for engagement, dialogue and debate with government, regulators, policy makers and the public. Developing public trust, understanding and confidence through ongoing active engagement will be a key activity of the new Institute.

Publications

10 25 50

publication icon
Bull S (2015) Best Practices for Ethical Sharing of Individual-Level Health Research Data From Low- and Middle-Income Settings. in Journal of empirical research on human research ethics : JERHRE

publication icon
Lehtinen S (2017) Evolution of antibiotic resistance is linked to any genetic mechanism affecting bacterial duration of carriage. in Proceedings of the National Academy of Sciences of the United States of America

 
Description Department of Health, Digital Health Forum
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Description External Reference Group for the Ministerial Industry Strategy Group Research through Health Data Porgramme
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Description Global Alliance for Genomics and Health, Data Working Group Summit
Geographic Reach Multiple continents/international 
Policy Influence Type Membership of a guideline committee
 
Description National Health and Biomedical Informatics Institute Academic Workshop, Farr Institute
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Description Office of the Chief Scientist
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Description Provided data to Vector Control gorup of the WHO Global Malaria Programme
Geographic Reach Multiple continents/international 
Policy Influence Type Gave evidence to a government review
 
Description WHO Global Vector Control Response
Geographic Reach Multiple continents/international 
Policy Influence Type Gave evidence to a government review
Impact Refined estimates for the global population at risk from key vector-borne diseases (leishmaniasis, American trypanosomiasis, human African trypanosomiasis, dengue, malaria, Japanese encephalitis, lymphatic filariasis, onchoceriasis and yellow fever) for the Global Vecotr Control Response being developed by the WHO
 
Description Accelerating Medicines Partnership: Type 2 Diabetes Knowledge Portal Federated Node Implementation
Amount £2,000,000 (GBP)
Funding ID FLIC16AMP 
Organisation Foundation for the National Institutes of Health (FNIH) 
Sector Charity/Non Profit
Country United States
Start 12/2015 
End 12/2018
 
Description BEEHIVE - Advanced Grant
Amount £599,997 (GBP)
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 07/2016 
End 03/2019
 
Description BHF NDPH Pump-priming
Amount £36,000 (GBP)
Organisation British Heart Foundation (BHF) 
Sector Charity/Non Profit
Country United Kingdom
Start  
 
Description BRC3 Sub-theme co-lead with 3 themes: Big Data, Cancer and Antimicrobial Resistance
Amount £114,000,000 (GBP)
Organisation National Institute for Health Research 
Department NIHR Biomedical Research Centre
Sector Academic/University
Country United Kingdom
Start  
 
Description CR-UK Prostate International Cancer Genome Consortium
Amount £110,362 (GBP)
Organisation Cancer Research UK 
Sector Charity/Non Profit
Country United Kingdom
Start 03/2016 
End 12/2018
 
Description Collaborative Award: Using parasite population genomics to improve understanding of malaria epidemiology
Amount £4,000,000 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start  
 
Description Development & dissemination of efficient analysis methods for large, complex, heterogeneous clinical datasets
Amount £6,000,000 (GBP)
Funding ID MR/L016265/1 
Organisation Medical Research Council (MRC) 
Sector Academic/University
Country United Kingdom
Start 01/2014 
End 12/2018
 
Description Genetic Analysis of Populations
Amount £1,900,000 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2014 
End 12/2018
 
Description Geospatial Analyses of Pneumonia, Diarrhea, Malaria, HIV/AIDS, TB and Selected Eradicable Neglected Tropical Diseases
Amount $2,600,000 (USD)
Organisation Bill and Melinda Gates Foundation 
Sector Charity/Non Profit
Country United States
Start  
 
Description Geospatial Analysis
Amount £530,001 (GBP)
Organisation Bill and Melinda Gates Foundation 
Sector Charity/Non Profit
Country United States
Start 09/2016 
End 10/2020
 
Description Geospatial Modelling
Amount £1,179,016 (GBP)
Organisation Bill and Melinda Gates Foundation 
Sector Charity/Non Profit
Country United States
Start 09/2016 
End 10/2020
 
Description Geospatial modelling for malaria risk stratification and intervention targeting
Amount $2,200,000 (USD)
Organisation Bill and Melinda Gates Foundation 
Sector Charity/Non Profit
Country United States
Start  
 
Description Grant to establish Wellcome Centre for Ethics, Innovation, Globalisation and Medicine
Amount £2,991,157 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start  
 
Description ISSF
Amount £67,082 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 10/2014 
End 11/2018
 
Description Li Ka Shing Foundation
Amount £20,000,000 (GBP)
Organisation Li Ka Shing Foundation 
Sector Charity/Non Profit
Country Hong Kong
Start 04/2013 
End 12/2023
 
Description Malaria Vectors
Amount £669,719 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2016 
End 12/2018
 
Description NDPH pump priming
Amount £23,000 (GBP)
Organisation University of Oxford 
Department Nuffield Department of Population Health
Sector Academic/University
Country United Kingdom
Start  
 
Description NIAID grant Genome based diagnostics for mapping, monitoring and management of insecticide resistance in African malaria vectors
Amount $494,803 (USD)
Organisation National Institute of Allergy and Infectious Diseases (NIAID) 
Sector Public
Country United States
Start  
 
Description NIH/NICHD P50 Grant Project 2: Common Complex Trait Genetics of Reproductive Phenotypes
Amount $5,000,805 (USD)
Funding ID 2P50HD028138-26 
Organisation National Institutes of Health (NIH) 
Sector Public
Country United States
Start  
 
Description NIHR Oxford Biomedical Research Centre
Amount £4,000,000 (GBP)
Organisation National Institute for Health Research 
Sector Public
Country United Kingdom
Start 04/2017 
 
Description PopART
Amount £71,846 (GBP)
Organisation National Institutes of Health (NIH) 
Sector Public
Country United States
Start 11/2016 
End 11/2018
 
Description PopART Phylo
Amount £240,761 (GBP)
Organisation National Institutes of Health (NIH) 
Sector Public
Country United States
Start 08/2016 
End 11/2018
 
Description ROADMAP II
Amount £1,551,985 (GBP)
Organisation Bill and Melinda Gates Foundation 
Sector Charity/Non Profit
Country United States
Start 04/2014 
End 05/2019
 
Description Robertson Foundation
Amount £10,000,000 (GBP)
Organisation Robertson Foundation 
Sector Academic/University
Country Unknown
Start 07/2012 
End 05/2018
 
Description Special Training Fellowship in Biomedical Informatics
Amount £270,497 (GBP)
Funding ID MR/N015355/1 
Organisation Medical Research Council (MRC) 
Sector Academic/University
Country United Kingdom
Start  
 
Description Strategic Award: A systematic approach to understanding the biology underpinning GWAS hits
Amount £2,500,000 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2015 
End 12/2018
 
Description The chnaging endemicity and disease burden of Plasmodium falciparum in Africa since 2000
Amount £1,052,664 (GBP)
Funding ID MR/K00669X/1 
Organisation Medical Research Council (MRC) 
Sector Academic/University
Country United Kingdom
Start 01/2013 
End 12/2017
 
Description UK infrastructure for large-scale genomics research
Amount £23,988,316 (GBP)
Organisation Medical Research Council (MRC) 
Sector Academic/University
Country United Kingdom
Start  
 
Description WIDENLIFE
Amount £113,334 (GBP)
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 01/2016 
End 02/2017
 
Description WIDENLIFE Widening the Scientific Excellence for Maternal Female Reproductive and Fetal Health and Wellbeing
Amount € 1,064,608 (EUR)
Organisation European Commission 
Sector Public
Country European Union (EU)
Start  
 
Description Wellcome Strategic Award The Global Health Bioethics Network
Amount £1,994,122 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start  
 
Description Spatiotemporal patterns of insecticide resistance 
Organisation Liverpool School of Tropical Medicine
Country United Kingdom 
Sector Academic/University 
PI Contribution Joint academic partnership
Collaborator Contribution Joint academic partnership
Impact n/a
Start Year 2016
 
Description AHSN Alumni Summit. Genomics, Proteomics and Data Session 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The session will explore some of the challenges facing academics, clinicians and healthcare providers developing new types of data, connectivity, interpretation and scale. The promise of deeply-enriched genotypic and phenotypic datasets signals expanded horizons for the identification of disease risk, diagnosis and treatment. The data pathway starts with measurements of genotype and phenotype which capture patient data of sufficient breadth and depth. Biobanks are integral to this approach, as are data extraction processes which need to be long-term and based on robust patient engagement. Datasets need to be sufficiently large to ensure that any underlying signal can be pulled out, particularly in a real-time setting.

Subsequent data integration is both a challenge and opportunity requiring a boundary shift from research to the clinic, ultimately making different datasets function within varied healthcare and commercial settings. Translation of the very best algorithms, and making them work in a high throughput, commercial environment, will require new pathways from research to clinical service provision, which will be dependent on data interpretation and integration at scale. These new pathways will require strong partnerships between diverse organisations that may have different goals and ambitions.
Year(s) Of Engagement Activity 2015
 
Description BBC Online interview 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact In this BBC article, professors Martin Landray and Harry Hemingway highlighted the benefits to health from large data sets.
Year(s) Of Engagement Activity 2016
 
Description Big Data and Drug Discovery 2013 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The Osler Discussion
Year(s) Of Engagement Activity 2013
 
Description Big Data in Biomedicine 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Bringing together thought leaders in large-scale data analysis and technology to transform the way we diagnose, treat and prevent disease. Learn more: http://stanford.io/1M8v9ra
Martin Landray
Year(s) Of Engagement Activity 2015
 
Description Big Data in Biomedicine 2014 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Interview for the Big Data in Biomedicien Conference, Stanford
Year(s) Of Engagement Activity 2014
 
Description Festival of Genomics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact N/A
Year(s) Of Engagement Activity 2016
 
Description Isis Innovation & Oxford AHSN Technology Showcase 2015 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact eHealth & Big Data - Innovation with Impact, Martin Landray
Year(s) Of Engagement Activity 2015
 
Description Life Sciences Strategy Workshop 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Life Sciences Strategy Workshop, Offcie for Life Sciences (Dept for Health, BEIS)
Year(s) Of Engagement Activity 2017
 
Description Oxford Mail interview 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Media (as a channel to the public)
Results and Impact Interview for 'City is at the forefront of disease prevention', July 2014
Year(s) Of Engagement Activity 2014
 
Description Oxford Martin Lecture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact A wealth of new and advancing technologies are changing the way we approach research in healthcare. The use of big data sets, precision medicine and machine learning mean that research studies can be bigger, cheaper and wider reaching than ever before. In this lecture, Professor Martin Landray, Deputy Director of the Big Data Institute, and Professor of Medicine and Epidemiology at the University of Oxford, considered how recent advancements in healthcare technologies have radically changed how we go about medical research, and look at how future innovations could further shape the field.
Year(s) Of Engagement Activity 2016
 
Description Presentation at National Academies of Sciences, Engineering and Medicine 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact 10/20/2015 - Session II: Developing Capabilities to Integrate and Use Data from Very Large Data Sets - Martin Landray
Year(s) Of Engagement Activity 2015
 
Description Qualisty by Design for Clinical Trials 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Clinical Trial Transformation Initiative, Bethseda
Year(s) Of Engagement Activity 2015
 
Description The Alumni Summit 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact AHSN, The Alumni Summit July 2015, Gil McVean and Martin Landray presented
Year(s) Of Engagement Activity 2015
 
Description The Economist article 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Interview for the article 'Testing, Testing', July 2014
Year(s) Of Engagement Activity 2014