MICA: Medical Bioinformatics: Data-Driven Discovery for Personalised Medicine

Lead Research Organisation: University College London
Department Name: Chemistry

Abstract

We will improve patient health and medical research by maximising the use of vast amounts of human data being generated in the NHS. But there are two obstacles: (i) inter-related clinical and research datasets are dispersed across numerous computer systems making them hard to integrate; (ii) there is a serious shortage of computational expertise as applied to clinical research.
As part of the UK's healthcare strategy to overcome these limitations, we have assembled a world-class consortium of institutions and scientists, including UCL Partners (containing NHS Trusts treating >6 million patients), Francis Crick Institute, Sanger Institute and European Bioinformatics Institute. Close links with the NHS (through Farr and Genomics England) will allow information exchange for health and disease progression. We have also engaged leading companies like GSK and Intel.
We will use the MRC funds for two purposes:
1. Create a powerful eMedLab data centre. We will build a computer cluster that allows us to store, integrate and analyse genetic, patient and electronic health records. By co-locating in a single centre, we eliminate delays and security risks that occur when information is transmitted. Research Technologists supplied by the partners will install and maintain the infrastructure and software environment.
2. Expand scientific and technical expertise in UK Medical Bioinformatics through a Research & Training Academy. Basic and clinical scientists, and bioinformaticians will be trained to perform world-leading computational biomedical science. We will train in the whole range of skills involved in medical bioinformatics research with taught courses, seminars, workshops and informal discussion. To coordinate research activities across partners, we will establish Academy Labs, which are flexible, semi-overlapping groupings of academic and industrial researchers to share insights and plan activities in areas of common analytical challenges. The Academy will provide a mechanism for information and skills exchange across the traditional boundaries of disease types.
These will enable existing projects in 3 disease domains in which we have unique strengths: rare diseases, cardiovascular diseases and cancer. Rare: We house 31/70 Nationally Commissioned Highly Specialised Services; ~0.5M of the 6M of our patients have a rare disease, including >50% of those treated at Great Ormond Street Hospital. >200 research teams generate large quantities of genetic, imaging (eg, 3D facial reconstructions), and clinical information (eg, patient records). Cardiovascular: We also lead genomic, imaging, and health informatics programmes in cardiovascular disease with contributions to projects like UK10k project and host multiple national cardiovascular registries through the National Institute for Cardiovascular Outcomes Research. These are linked to primary and hospital clinical care records through Farr@UCLP with current cohort sizes of ~2M people. Cancer: We also have particular clinical expertise in some of the most difficult to treat cancer types and we host major international data resources. These include individuals recruited to the TRACERx study of lung cancer, 8,500 women with abnormal cervical smears in whom methylation patterns of the HPV16 genome predict progression to high-grade precursor disease, and one of the largest sarcoma biobanks in the world.

Ultimately, this bid will allow us to use new computational approaches to (i) link patient records and research data in order to understand the pathogenesis of disease, (ii) use genomic, imaging and clinical data to identify diagnostic, prognostic and predictive biomarkers to guide therapy, predict outcome and increase recruitment to clinical trials based on stratified populations and (iii) translate new IP by engagement with the pharmaceutical industry.

Technical Summary

We request funding for: 1. a collocated, large-scale data storage and compute facility (eMedLab) and 2. a Medical Bioinformatics Research and Training Academy. These strategically vital infrastructures will enable research into new approaches to understand cancer, rare and cardiovascular diseases; establish stratified clinical trials, and discover diagnostic and prognostic biomarkers for clinical practice.
1. We have a team of 10 Research Technologists to develop and maintain eMedLab (>9,000 cores, 4PB, JANET6 network). We shall consolidate dispersed datasets in a single facility, eliminating delays and security risks involved in data transfer. Direct links to NHS Trusts will allow integration of clinical data for >6M patients. We will interface with industry-derived data and the Global Alliance for secure data sharing. We will deploy a knowledge management platform like tranSMART and sensitive information will be placed in secure areas complying with NHS requirements. Rapid developments make long-term planning difficult; but we have designed eMedLab to be flexible for future growth.
2. The Academy will be a forum for basic and clinical scientists, and bioinformaticians to train in medical bioinformatics techniques, and to interact across the traditional boundaries of disciplines and disease areas. (i) We will support leaders in the field by funding 4 Career Development Fellowships to establish new laboratories. (ii) We will coordinate training to co-supervise and mentor scientists, provide courses, seminars, and workshops. We will leverage existing Doctoral Training Programmes across partners, and we will align with the Europe-wide strategies of the ELIXIR-UK training node. (iii) We will coordinate research activities through Academy Labs - analogous to the Crick "Interest Groups" - that comprise flexible, semi-overlapping groupings of academics and industrial partners to share and plan activities in areas of common analytical challenges.

Planned Impact

The benefits of the proposed initiative are extensive, ranging from the establishment of the UK as a world leader in biomedical research to increased innovation and improvement of competitiveness.

1. Impact on clinical outcomes
This project will enable NHS data to be used for innovative biomedical research. We are building on current advances in linking standard primary healthcare data and merging it with secondary care data on individual patients utilising UCLP's sizeable population of 6.3 million. Rapid advances in high throughput low-cost sequencing will enable omics data to be integrated, providing insights into the genetic basis of disease. Imaging data, at the molecular and organ levels, will also be integrated, refining medical interventions to improve outcomes, helping to strengthen the new field of systems medicine. This richness of data will support stratification of treatments, eventually enabling tailor-made treatments for individuals, reducing the burden on the UK health system. In the longer term, we expect our bid to revolutionise 21st century biomedicine in the UK by fundamentally altering the basis for disease diagnosis and treatment. It will transform healthcare into a personalised, predictive, participatory and preventative process.

2. Impact on healthcare policy
Our bid will influence future policy on research use of medical data, and in return the use of genomic and imaging data in clinical settings. Our work will transform guidelines for treatment trials, and provide strategies for accurate assessment of different interventions and prevention methods; for instance, analysing mutation and methylation patterns in cancer and viral (in case of cervical cancer) genomes will improve predictions of the likelihood of cancers, or infections progressing to high-grade disease.

3. Impact on industry
Our bid builds on strong existing partnerships with the healthcare, pharmaceutical and computing industries to enhance capacity in commercial medical bioinformatics. Through joint projects and workshops, we will establish cooperation between academia and industry, reducing barriers in what are traditionally seen to be disparate domains. We will enable existing SMEs and start-ups to take advantage of cutting-edge data analytics to advance commercial translation of discoveries. We will also collaborate with international partners using the Global Alliance as one of the mechanisms, to integrate our infrastructure and to define interoperability standards, data-sharing and to promote a vision of global e-health informatics platform with enormous healthcare benefits.

4. Impact on capacity
Our bid will also educate the next generation of scientists in medically based computational science and data analytics. There is a shortage in expertise required to manipulate large datasets, and researchers have traditionally resorted to learning about computational techniques "on the fly" in response to specific research needs. The Academy is a vital component of our bid, as education and training are fundamental to make rapid advances in the age of big data.

5. Impact on the public
Our bid will also educate the public about the benefits of personalised medicine, through public dissemination and outreach activities, and partnership with aligned initiatives (eg, Personal Genome Project). We will take every care to protect the privacy of healthcare records, thereby seeking to reduce an obstacle to the widespread uptake and support of personalised medicine within the UK and more widely.

Publications

10 25 50
 
Description New Investigator Award
Amount £326,406 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2018 
End 03/2021
 
Description The Generation Trust
Amount £160,000 (GBP)
Organisation The Generation Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 10/2016 
End 09/2020
 
Description WHRI Academy (COFUND Marie Curie) Fellowship
Amount £87,712 (GBP)
Organisation Marie Sklodowska-Curie Actions 
Sector Academic/University
Country Global
Start 12/2015 
End 11/2018
 
Description Article Scientific Computing World, February 2016 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Tom Wilkie interviews Dr Jacky Pallas UCL about eMedLab private cloud and how this changes how we do scientific computing
Year(s) Of Engagement Activity 2016
URL http://www.scientific-computing.com/news/news_story.php?news_id=2781
 
Description Article, Computer Weekly Jan 2016 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Antony Adshead interviews Dr Bruno Silva, Crick, Dr. Jacky Pallas, UCL and Prof. Nick Luscombe, UCL/Crick about the eMedLab private cloud cluster and use of open source software to run it
Year(s) Of Engagement Activity 2016
URL http://www.computerweekly.com/news/4500270830/HPC-research-cluster-get-Red-Hat-OpenStack-private-clo...
 
Description Article, Computing Jan 16 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Danny Palmer interviews Dr Bruno Silva, Crick, Dr. Jacky Pallas, UCL and Prof. Nick Luscombe, UCL/Crick about eMedLab cloud computing for Computing magazine
Year(s) Of Engagement Activity 2016
URL http://www.computing.co.uk/ctg/feature/2442219/how-cloud-based-supercomputers-are-helping-scientists...
 
Description Article, Lab News, April 2016 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Article in Lab News about the eMedLab private cloud computing cluster Dr Jacky Pallas
Year(s) Of Engagement Activity 2016
URL http://www.labnews.co.uk/features/private-cloud-19-04-2016/
 
Description Capture Hi-C workshop with the Biochemical Society 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact 18 participants learnt about the experimental and bioinformatics procedures involved in a capture Hi-C experiment, aiming to help understanding how this chromatin organisation assay works.
Year(s) Of Engagement Activity 2017
URL https://www.biochemistry.org/Events/tabid/379/MeetingNo/TD017/view/Conference/Default.aspx
 
Description EBI course 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Teaching the ChIP-seq module at the Advanced High Throughput Sequencing Course to 30-40 participants.
Year(s) Of Engagement Activity 2011,2012,2013,2014,2015
 
Description EMBO course 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Teaching ChIP-seq module at the EMBO practical course on Analysis of High Throughput Sequencing Data.
Year(s) Of Engagement Activity 2011,2012,2013,2014,2016
 
Description Filming for Travel Channel Documentary 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Filming for a documentary on the Travel Channel in which I spoke about mitochondria genetics and using variation in the mitochondrial genome for identifying people.
Year(s) Of Engagement Activity 2017
 
Description Lab News article - Make it rain 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A follow-on article by Jacky Pallas with a trade magazine which centred around use of private cloud computing for biomedical research, using eMedLab as the exemplar.
Year(s) Of Engagement Activity 2017
URL https://www.labnews.co.uk/features/make-it-rain-12-10-2017/
 
Description MRC eMedLab Early Career Researcher Workshop in Computational Biology 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Workshop event for early career researchers working in computational biology, to establish relationships for future collaborative outputs and translation opportunities.
Year(s) Of Engagement Activity 2017
 
Description Scientific Computing Weekly, March 2016 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Trade journal article from Dr Bruno Silva, Crick Inst., as follow up to earlier interview about using cloud computing to reduce the time to science
Year(s) Of Engagement Activity 2016
URL http://www.scientific-computing.com/news/news_story.php?news_id=2798
 
Description Scientific Computing World Feb 2017 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact SCW requested interview with Dr Jacky Pallas about eMedLab as an exemplar of a private cloud for research
Year(s) Of Engagement Activity 2017
URL http://content.yudu.com/web/tzly/0A42fue/SCWFEBMAR17/html/index.html?page=20
 
Description Talk at American Society of Human Genetics Meeting, Vancouver, Canada 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Platform talk at international genetics conference. Primary audience was genetic researchers worldwide, but also attended by media, postgraduate students and industry.
Year(s) Of Engagement Activity 2016
 
Description Talk at the 2017 IMB Conference "Gene Regulation by the Numbers: Quantitative Approaches to Study Transcription" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Gave a scientific talk at this conference.
Year(s) Of Engagement Activity 2017
URL https://www.imb.de/seminars-meetings/meetings/2017-imb-conference-gene-regulation-by-the-numbers/
 
Description Technical presentation, FOSDEM 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Adam Huffman, Crick Institute, gave a talk about the eMedLab cloud cluster at FOSDEM (Free Open Source Software Developers European Meeting) within the HPC, Big Data and Data Science developers room.
Year(s) Of Engagement Activity 2017
URL https://fosdem.org/2017/schedule/event/cloud_hpc_containers/
 
Description Technical presentation, OpenStack summit, Barcelona 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Adam Huffman, Crick Institute, gave a talk about the use of OpenStack on the eMedLab private cloud to a scientific working group at the international OpenStack summit 2016, Barcelona. Adam won an award for Best Talk in the working group at the summit.
Year(s) Of Engagement Activity 2016
 
Description Translational Bioinformatics conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Participated in a conference about translational bioinformatics.
Year(s) Of Engagement Activity 2017
URL https://coursesandconferences.wellcomegenomecampus.org/events/item.aspx?e=640
 
Description UCLPartners AHSC Cardiovascular eMedLab and UK Biobank Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A one day workshop for researchers which focused on Cardiovascular research using eMedLab and UK Biobank imaging and genomics data
Year(s) Of Engagement Activity 2017
URL https://uclpartners.com/events/cardiovascular-emedlab-workshop/
 
Description eMedLab Launch Symposium 14 Jan 2016 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact We held a one day symposium at the Wellcome Collection to launch eMedLab. The audience (c. 150 people) consisted of researchers, industry, funder representatives from across the UK. The keynote was given by Prof Sir John Savill, CEO of MRC.
Year(s) Of Engagement Activity 2016
 
Description eMedLab Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact Presentation was to highlight the work being undertaken within the eMedLab group. Audience included regional research groups, postgraduates, individuals from industry and representatives from funding bodies.
Year(s) Of Engagement Activity 2016
 
Description eMedLab Stratified Medicine Symposium, 20 Oct 2016 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact We hosted an eMedLab Stratified Medicine Informatics workshop at the Crick Institute on 20 Oct 2016. It was attended by researchers, students and industry. Talks were given by industry people (Janssen) and the keynote was Simon Anders, Institute for Molecular Medicine Finland. It was attended by the MRC programme manager for Stratified Medicine and resulted in a follow-up meeting organised by MRC for their projects funded under that stream to learn about how eMedLab is supporting the computational and analytical requirements of a number of auto-immune studies.
Year(s) Of Engagement Activity 2016
 
Description eMedLab technical Symposium 23 May 2016 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact We held an eMedLab technical Symposium in the Wellcome Trust, 23 May 2016. The audience of ~120 people comprised researchers, technical specialists, industry people. The keynote was given by Dr Chris Dwan, Broad Institute.
Year(s) Of Engagement Activity 2016