MICA: InterdisciPlInary Collaboration for efficienT and effective Use of clinical images in big data health care RESearch: PICTURES

Lead Research Organisation: University of Dundee
Department Name: Population Health and Genomics

Abstract

Clinical imaging including X-rays, CT, MRI, ultrasound and nuclear medicine scans are core diagnostic technologies. These images can support many important areas of research to improve any or all of diagnosis, monitoring of disease progression and response to treatment. Currently most research using images is based on those collected specifically for a particular research project. The images are of "research" quality. That means that they are captured at high resolution using standardised procedures to reduce the variability of images. Research data collection is expensive so studies tend to be small, and the people who take part in research studies are different to those seen in normal clinical care. It is therefore often uncertain whether research findings can be translated to "real world" images or patients.

Each year millions of clinical images are generated in Scotland through routine examinations at hospitals and stored in a huge database. The Scottish national imaging database currently has ~23 million different images collected since 2010. Access to these "real world" images would be extremely valuable for research, but there are a number of big challenges. Firstly, it is very important that all data is kept confidential. Secondly, imaging datasets are very large which it technically challenging. Thirdly, the software which manages these images is optimised for retrieval of images by NHS staff specifically for an individual patient's clinical care (e.g. return Mrs Jones' scan taken on the 28th of May 2016) rather than for research (e.g. return all the CT chest scans of smokers between age 55 and 65 where a contrast agent has been used).

What will be delivered? This 5 year programme will enable secure access to routinely collected imaging data for research. Using the foundation blocks already in place from previous research grants, PICTURES will extend, scale and enhance innovative open source software to query a research copy of the Scottish National imaging database securely hosted by the University of Edinburgh and provide anonymised extracts of hundreds of thousands of images for research. PICTURES will also develop this software to query imaging data linked to genomic data securely hosted by the University of Dundee.

There are 3 main areas of research required within the core programme: (1) Data science research for complex cohort building from real-world, messy data. (2) Engineering required for scaling and handling big data within a Safe Haven environment. (3) Cybersecurity research needed to ensure that the patient data is securely held and de-identified appropriately for research.

PICTURES will support 2 major exemplar research projects to guide and shape the underpinning resources. Exemplar one will develop a method to detect lung nodules and coronary artery calcification using hundreds of thousands of CT chest scans provided by the core programme. It will also predict the risk of getting lung cancer based upon the presence of lung nodules and the risk of cardiovascular disease based upon the presence of coronary artery calcification. This exemplar will work in partnership with an industrial partner, Aidence, to validate and test the method directly in NHS clinical workstations within the course of the programme.
Exemplar 2 will predict individual risk of dementia in people with diabetes using MRI brain scans, genetic data and medical records. The most important variables will be found. The predictive tool will be validated on the large image dataset provided by the core programme.

Both of our exemplars will determine new information from routinely collected data that would otherwise have been ignored. Predicting and therefore treating diseases at an early stage improves patient outcomes and reduces the cost to the NHS.

PICTURES is truly interdisciplinary requiring expertise in Radiomics, AI, Cybersecurity, Software Engineering, Data Science, Data Governance and Medicine.

Technical Summary

The core programme focuses on 3 key areas:
(1) Complex cohort building from noisy, heterogeneous data: we will develop algorithms for text mining and standardising imaging metadata. We will use machine learning classification techniques to group images and natural language processing to pre-process and mine knowledge features from the free-text data and remove the identifiable data. We will utilise imaging processing algorithms to search for features within the core dataset to build new cohorts based upon pixel data.
(2) Scaling and handling big data: we will develop algorithms and optimisation processes for handling petabytes of imaging data and investigate how to optimise the use of GPUs within a virtual environment.
(3) Cybersecurity: we will ensure that our systems are secure but also meet the requirements of the research community.

Exemplar 1: Using CT scans provisioned from the national resource, we will use Deep Learning to train an algorithm to detect lung nodules and coronary artery calcification. The algorithm will be implemented as a Medical Device and made available within NHS Clinical PACS reporting workstations for Clinical performance evaluation and validation. To determine the risk of lung cancer and cardiovascular events we will use the nationally available longitudinal health outcomes data linked to the results from the CT scans.

Exemplar 2: To develop a risk score for dementia and an understanding of which data is most important for prediction, we will use voxel based feature selection and support vector machines within a cross-validation framework for MRI brain images. Non-imaging analyses will be primarily cross-sectional and longitudinal including time variable exposure clinical covariates as well as genomic covariates. Risk predictions using image, genetics and clinical data for individual patients will be combined using decision tree methods to create a best overall predictor. Algorithm validation will use the national resource.

Planned Impact

Industry: There are a range of companies who will benefit from access to the imaging data e.g. Imaging Equipment Companies (Canon, GE, Siemens, and Philips), Imaging Contrast Media Companies (GE, Guerbet, Bracco) and Medical and Surgical Device Companies (Medtronic, Baxter).

Costs for the academic community: Scalable access to large quantities of routinely collected de-identified images via automated, reproducible processes will reduce the effort of obtaining the data required to answer research questions at scale. These resources will also reduce the effort of obtaining governance.

Widening and optimising access to data: Access to such large numbers of real world images has previously been very challenging. These resources should accelerate research in the field.

Patients: Increasing the availability of large scale routinely collected images linked to other forms of health data for both industry and academic use will lead to a greater likelihood of achieving results translatable into diagnoses and treatments.

Policy makers: There are many advantages of using a Safe Haven model for access to sensitive data. The models developed on the safe handling of big data for research may become an example of good practice worldwide, further raising the profile of UK healthcare research.

HDRUK: The UK wishes to be an internationally recognised centre for population based data research. These resources will add to the complement of excellent data available internationally.

Capacity Building and training: This programme will train the team in data science. There is a recognised shortage of expertise in this field (see Life Sciences Industrial Strategy and the ABPI Bridging the Skills Gap report).

Supporting Learning Healthcare Systems: This model has become a widely recognised approach applying a key principle of a feedback loop from the outcomes of research to directly improve clinical care. The work within PICTURES strongly supports this model.

Both exemplars will generate valuable knowledge in the use of routine clinical imaging as a source of potential clinically useful biomarkers. This will have a major impact on the cost effectiveness of clinical imaging in the NHS as potentially more clinically relevant information will be extracted beyond the clinical indication for the image.

Exemplar 1: We anticipate that the existing lung nodule detection tool will seamlessly interact with the coronary artery calcification tool, which will be developed. This will be relatively straightforward to integrate into PACS reporting workstations, allowing for immediate reporting of these findings during chest CT studies. Examples already exist of this and similar software tools, integrated within a variety of PACS vendor workstations. Once this integration is rolled out throughout the NHS, this should significantly improve the efficiency of radiology reporting, reduce errors of reporting and allow better individualised risk profiling and management options for patient care.

Exemplar 2: There are 3 significant impacts: (1) Enabling the identification of diabetics at risk of developing dementia affording clinicians the opportunity of optimising preventative measures. (2) Potentially facilitating the identification and recruitment of high risk individuals prior to dementia onset into clinical trials of prevention, an urgent need in dementia where trials of treating dementia once it has occurred have not been successful. This process will be augmented by integrating with the SHARE programme in Scotland. (3) The use of feature selection techniques will allow the identification of biological targets as disease biomarkers and/or therapeutic surrogate targets, this is in contrast to deep learning approaches whereby the discrimination is agnostic to biological structure or function.

Publications

10 25 50
 
Description Digital Health Research and Policy in the UK and Switzerland
Geographic Reach Europe 
Policy Influence Type Participation in a advisory committee
Impact Lessons from the Past, Plans for the Future - Invite from the British-Swiss ambassador and the UK Science and Innovation Network to discuss future collaborations between our countries.
 
Description Invited External Advisory Board Member EurOPDX (H2020 Project)
Geographic Reach Europe 
Policy Influence Type Participation in a advisory committee
 
Description Invited Member of MRC Population Health Sciences Group (PHSG)
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
Impact Oversee population health sciences investment across MRC Boards and panels. Advise MRC Strategy Board, boards and panels on development and implementation of strategies and policies. Advise on strategic funding initiatives and partnership activities. Carry out gap analyses and horizon scanning.
 
Description Leading HDR UK short life working group for imaging data interoperability and integration
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
Impact Chairing a series of workshops with leading centres across the UK on behalf of HDR UK and working with Innovate UK. Developing a strategy for creating a UK-wide Imaging AI Ecosystem.
 
Description NIHR Imaging Group Setup
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
 
Description Translational Research Group Workshop: Enabling pathways for AI.
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
 
Description Cambridge Mathematics of Information in Healthcare (CMIH)
Amount £1,275,504 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 06/2020 
End 06/2023
 
Description Centre for Antimicrobial Resistance
Amount £2,253,124 (GBP)
Organisation National Institute for Health Research 
Sector Public
Country United Kingdom
Start 10/2018 
End 10/2020
 
Description Creating a national platform for powerful molecular studies of multiple conditions: the HDRUK multiomics consortium
Amount £1,088,605 (GBP)
Organisation Health Data Research UK 
Sector Private
Country United Kingdom
Start 08/2020 
End 08/2022
 
Description Defining & Redefining Disease Using Multimodal Data on a National Scale: the HDR UK Phenomics Resource
Amount £1,087,168 (GBP)
Organisation Health Data Research UK 
Sector Private
Country United Kingdom
Start 04/2020 
End 04/2023
 
Title Open source software to manage real-world clinical radiology data linked to other health data 
Description Scotland has a central archive of radiological data used to directly provide clinical care to patients. We have developed an architecture and platform to securely extract a copy of that data, link it to other clinical or social data sets, remove personal data to protect privacy, and make the resulting data available to researchers in a controlled Safe Haven environment. We have released the software open source for other groups to use. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact New service provide by National Service Scotland to support academics and industry to access to clinical imaging data at scale. 
URL https://github.com/SMI
 
Title Scottish Medical Imaging 
Description Radiological Images from Scottish Population 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact THis will be available for many research groups to use. 
URL https://github.com/SMI
 
Description HDR UK Multiomics - pan UK 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Bringing expertise in Safe Havens and clinical data management to the collaboration.
Collaborator Contribution Bringing expertise in management of other omic data.
Impact A new awarded grant to HDR UK for a pan uk project: Creating a national platform for molecular studies of multiple conditions: HDRUK multiomics consortium
Start Year 2019
 
Description HDR UK Multiomics - pan UK 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Bringing expertise in Safe Havens and clinical data management to the collaboration.
Collaborator Contribution Bringing expertise in management of other omic data.
Impact A new awarded grant to HDR UK for a pan uk project: Creating a national platform for molecular studies of multiple conditions: HDRUK multiomics consortium
Start Year 2019
 
Description HDR UK Multiomics - pan UK 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Bringing expertise in Safe Havens and clinical data management to the collaboration.
Collaborator Contribution Bringing expertise in management of other omic data.
Impact A new awarded grant to HDR UK for a pan uk project: Creating a national platform for molecular studies of multiple conditions: HDRUK multiomics consortium
Start Year 2019
 
Description HDR UK Multiomics - pan UK 
Organisation Swansea University
Country United Kingdom 
Sector Academic/University 
PI Contribution Bringing expertise in Safe Havens and clinical data management to the collaboration.
Collaborator Contribution Bringing expertise in management of other omic data.
Impact A new awarded grant to HDR UK for a pan uk project: Creating a national platform for molecular studies of multiple conditions: HDRUK multiomics consortium
Start Year 2019
 
Description HDR UK Multiomics - pan UK 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Bringing expertise in Safe Havens and clinical data management to the collaboration.
Collaborator Contribution Bringing expertise in management of other omic data.
Impact A new awarded grant to HDR UK for a pan uk project: Creating a national platform for molecular studies of multiple conditions: HDRUK multiomics consortium
Start Year 2019
 
Description HDR UK Multiomics - pan UK 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Bringing expertise in Safe Havens and clinical data management to the collaboration.
Collaborator Contribution Bringing expertise in management of other omic data.
Impact A new awarded grant to HDR UK for a pan uk project: Creating a national platform for molecular studies of multiple conditions: HDRUK multiomics consortium
Start Year 2019
 
Description HDR UK Multiomics - pan UK 
Organisation University of Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution Bringing expertise in Safe Havens and clinical data management to the collaboration.
Collaborator Contribution Bringing expertise in management of other omic data.
Impact A new awarded grant to HDR UK for a pan uk project: Creating a national platform for molecular studies of multiple conditions: HDRUK multiomics consortium
Start Year 2019
 
Description HDR UK Phenotype Portal - UK wide project 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution New collaboration between HDR UK groups across the UK to develop a Phenotype portal. Workstream lead for the web portal.
Collaborator Contribution The collaborators bring their expertise in phenotypes and data science to a pan UK project
Impact A collaborative grant to HDR UK which was funded
Start Year 2019
 
Description HDR UK Phenotype Portal - UK wide project 
Organisation King's College London
Country United Kingdom 
Sector Academic/University 
PI Contribution New collaboration between HDR UK groups across the UK to develop a Phenotype portal. Workstream lead for the web portal.
Collaborator Contribution The collaborators bring their expertise in phenotypes and data science to a pan UK project
Impact A collaborative grant to HDR UK which was funded
Start Year 2019
 
Description HDR UK Phenotype Portal - UK wide project 
Organisation Swansea University
Country United Kingdom 
Sector Academic/University 
PI Contribution New collaboration between HDR UK groups across the UK to develop a Phenotype portal. Workstream lead for the web portal.
Collaborator Contribution The collaborators bring their expertise in phenotypes and data science to a pan UK project
Impact A collaborative grant to HDR UK which was funded
Start Year 2019
 
Description HDR UK Phenotype Portal - UK wide project 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution New collaboration between HDR UK groups across the UK to develop a Phenotype portal. Workstream lead for the web portal.
Collaborator Contribution The collaborators bring their expertise in phenotypes and data science to a pan UK project
Impact A collaborative grant to HDR UK which was funded
Start Year 2019
 
Description HDR UK Phenotype Portal - UK wide project 
Organisation University of Birmingham
Country United Kingdom 
Sector Academic/University 
PI Contribution New collaboration between HDR UK groups across the UK to develop a Phenotype portal. Workstream lead for the web portal.
Collaborator Contribution The collaborators bring their expertise in phenotypes and data science to a pan UK project
Impact A collaborative grant to HDR UK which was funded
Start Year 2019
 
Description HDR UK Phenotype Portal - UK wide project 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution New collaboration between HDR UK groups across the UK to develop a Phenotype portal. Workstream lead for the web portal.
Collaborator Contribution The collaborators bring their expertise in phenotypes and data science to a pan UK project
Impact A collaborative grant to HDR UK which was funded
Start Year 2019
 
Description HDR UK Phenotype Portal - UK wide project 
Organisation University of Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution New collaboration between HDR UK groups across the UK to develop a Phenotype portal. Workstream lead for the web portal.
Collaborator Contribution The collaborators bring their expertise in phenotypes and data science to a pan UK project
Impact A collaborative grant to HDR UK which was funded
Start Year 2019
 
Description HDR UK Phenotype Portal - UK wide project 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution New collaboration between HDR UK groups across the UK to develop a Phenotype portal. Workstream lead for the web portal.
Collaborator Contribution The collaborators bring their expertise in phenotypes and data science to a pan UK project
Impact A collaborative grant to HDR UK which was funded
Start Year 2019
 
Description Imaging Collaboration with Cambridge 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Bringing expertise in software development
Collaborator Contribution Expertise in AI using clinical data
Impact EPSRC awarded grant for us to collaborate further
Start Year 2019
 
Title Imaging RDMP 
Description An architecture and platform to securely extract a copy of that data, link it to other clinical or social data sets, remove personal data to protect privacy, and make the resulting data available to researchers in a controlled Safe Haven environment. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact New NHS service which can securely provision clinical imaging data for research and innovation 
 
Title Research Data Management Platform (RDMP) 
Description Software platform for managing longitudinal cohorts of research data and clinical record. Secure extraction of cohorts, audit and support for reproduciblity. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Other Safe Havens adopting the system 
URL https://www.youtube.com/watch?v=Fgi9-Sdup-Y
 
Description Grand Rounds Seminar Series 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Scaling Health Informatics: large scale recruitment, BIG data and analytics.
Year(s) Of Engagement Activity 2019
 
Description HDR UK Event - Scotland's Data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Scotland's Data
Year(s) Of Engagement Activity 2019
 
Description HIC Twitter Feed 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Promotion of the research and services of the Health Informatics Centre. Promoting the secure anonymised access to clinical data for research.
Year(s) Of Engagement Activity 2019,2020
URL https://twitter.com/dataonamission
 
Description HIC Website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Health Informatics Centre Website. Showing projects supported and safe use of clinical data for research.
Year(s) Of Engagement Activity 2015,2016,2017,2018,2019,2020
 
Description Invited Workshop lead: HDR UK - What does a world leading research infrastructure look like? 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Workshop on "What does a world leading research infrastructure look like? "
Year(s) Of Engagement Activity 2019
 
Description Invited seminar at the University of Edinburgh 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Talk on "Health Data Science at Scale: Moving from Descriptive to Predictive Analytics"
Year(s) Of Engagement Activity 2019
 
Description Invited speaker - Developing an AI Imaging Ecosystem. Refreshing Radiology in the North 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Talk on Developing an AI Imaging Ecosystem. Refreshing Radiology in the North (Scotland)
Year(s) Of Engagement Activity 2020
 
Description Invited to give seminar at Cambridge University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Talk on "Health Data Science at Scale. Moving from Descriptive to Predictive Analytics"
Year(s) Of Engagement Activity 2019
 
Description Invited to present to researchers at Queens University Belfast 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact Promote the outputs of the PICTURES programme and HDR UK infrastructure to support research using routinely collected data. Talk was "Data on a Mission".
Year(s) Of Engagement Activity 2020
 
Description PICTURES Twitter Feed 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Set-up to promote the work of the PICTURES Programme
Year(s) Of Engagement Activity 2019,2020
URL https://twitter.com/imageonamission
 
Description Second TV interview covering PICTURES Programme launch 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Media (as a channel to the public)
Results and Impact Interviewed by a regional TV channel covering the PICTURES project. This was then broadcast regionally.
Year(s) Of Engagement Activity 2019
 
Description TV interview covering the launch of the PICTURES Programme 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Interview by a Scottish TV company covering the PICTURES launch.
Year(s) Of Engagement Activity 2019
 
Description Talk at Scotland HDR UK Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Talk on "How can we make the UK leading in AI R&D using real world clinical data?" Edinburgh. HDR UK Scotland Research Day
Year(s) Of Engagement Activity 2019
 
Description Talk to Edinburgh University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Talk on Enabling research access to heterogeneous, routinely collected, linked clinical images at scale
Year(s) Of Engagement Activity 2019