Computer-aided CT imaging and integration with molecular endotyping to stratify fibrotic lung disease

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Molecular. Genetics & Pop Health

Abstract

Background:
Interstitial lung diseases (ILDs) account for approximately 7000 deaths in the UK every year, with idiopathic pulmonary fibrosis (IPF) identified as the most common diagnosis. Classification of ILDs is based upon causation, high resolution (HR) CT scan appearance and lung histology, which requires surgical biopsy.
This is problematic for numerous reasons, namely: 1. There is strong reliance on lung biopsy, an invasive procedure with a 2-7% mortality that many patients will not undergo, leading to a diagnosis of "unclassifiable disease"; 2. The current methods do not reliably inform of either prognosis or treatment efficacy; 3. Clinical reporting of CT's is subjective and not quantitative.
Lung fibrosis is notable for lacking definitive tools to achieve diagnostic precision, resulting in highly heterogenous disease entities.

Recent work:
CT texture analysis platforms, such as the Adaptive Multiple Features Method (AMFM) and the Computer-Aided Lung Informatics for Pathology Evaluation and Rating (CALIPER), have previously been studied with applications in clinical settings. However, these have not been validated in longitudinal cohorts in which ground truth (survival, time to hospitalisation, rate of decline in lung function, response to treatment) is known.
Over the past few years, advances in diagnostic and prognostic biomarker and genetic profiling in lung fibrosis have been made. Some of these have been validated in several cohorts of patients. However, the vast majority of these studies are confined to IPF. Additionally, the largest studies are based on patients recruited to clinical trials and not 'real-world' subjects.

Resources:
Data is available from established gene-, bio- and image-banks, and a unique, ethically approved, prospectively populated database designed to depict the natural history of lung fibrosis. The cohort consists of >1100 consecutively presenting consented patients with lung fibrosis since 2002, with less than 1% lost to follow-up. All patients have CT scans, and more than 800 patients have serial scans. CT scans are hosted within National Services Scotland (NSS) and this is co-located with the Farr network in the Edinburgh Farr node, enabling a safe haven analytic environment for imaging, clinical and 'omic data'.
Furthermore, serum and genomic DNA samples are available from the majority of subjects from the cohort, along with a complete dataset of variables including disease phenotype according to clinical-, CT-, biopsy-category, serial lung function.

Aims:
To integrate known and novel biomarkers, genetic polymorphisms and quantitative CT imaging (radiogenomics) such that these data can be effectively interrogated through machine learning approaches to define clinically meaningful clusters of disease. The aim is to determine homogenous subgroups that better define patient prognosis and response to therapy.

Preliminary programme of work:
1. Identify serum biomarkers, which may effectively discriminate between progressors and non-progressors.
2. Genotyping: perform analysis of GWAS and RNA-seq datasets.
3. Quantitative CT analysis with the CALIPER texture analysis platform. Validate the platform on our datasets. Develop and test an interactive protocol for classification of scans into diagnostic groups.
4. Integration and interrogation of molecular, imaging and phenotypical data such that analyses can be performed for the stratification of disease.

Ultimately, an automated and assistive tool would be developed for personalised predictions of diagnosis, prognosis, rate of decline and response to treatment in lung fibrosis, based on a diverse set of pre-defined variables.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
MR/N013166/1 01/10/2016 30/09/2025
1940067 Studentship MR/N013166/1 01/09/2017 30/06/2021 Alexander Przybylski
 
Title Imbio Quantitative CT Dataset 
Description Serial chest CT scans for all interstitial lung disease (ILD) subjects in our subcohort (those with molecular Luminex measurements, N>650), were processed by Imbio Lung Texture Analysis software, v1.3.3 (Imbio, Minneapolis, MN). For each CT scan, the output involves the segmentation of the lungs into six regions, with voxels classified into one of five textures (lung parenchymal patterns). The textures are: 'Normal', 'Hyperlucent', 'Ground Glass', 'Reticular', 'Honeycomb'. Our inclusion criteria permitted a range of different scans for processing, with multiple time-points for many patients. In total, over 1700 scans were passed through our pipeline. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact With our diverse set of CT scans processed, we outline inclusion and exclusion criteria necessary for subsequent data analysis. Our study thus shines light on the feasibility of using a retrospective cohort for quantitative CT scan processing, and some of the limitations and challenges for such software applications. This dataset forms the basis of the quantitative imaging component of my research project, part of the larger aim of an integrative and multi-modal data approach. All subsequent analysis will involve this dataset, with the overarching aims of predictive modeling of ILD patient outcomes, prognostication, and subtype discovery. 
URL https://www.imbio.com/lung-texture-analysis
 
Title Luminex Molecular Dataset 
Description All interstitial lung disease (ILD) subjects in our cohort with a banked serum sample were selected for molecular assays. This subcohort spans over 650 subjects, in a range of ILD diagnosis groups, with the largest being Idiopathic Pulmonary Fibrosis. We identified a set of 61 analytes of interest and for each subject serum sample, performed the assays via Luminex technology. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? No  
Impact This dataset forms the basis of the molecular component of my research project, part of the larger aim of an integrative and multi-modal data approach. All main subsequent analysis carried out involves this dataset, with the overarching aims of identifying biomarkers that associate with important clinical patient outcomes (such as survival, lung function decline, hospitalisation) and the identification of potential novel patient subgroups. More specifically for example, prognostic modelling has been carried out using the molecular measurements as covariates alongside clinical variables. The dataset has also raised further research questions, particularly surrounding missing and censored data. This has led to research on technical, machine learning-based approaches for the imputation of such values. All future analysis will use these molecular variables as covariates in any models developed. 
URL https://clinicaltrials.gov/ct2/show/study/NCT04016181
 
Description Imbio partnership 
Organisation Imbio, LLC
Country United States 
Sector Private 
PI Contribution Using the Lung Texture Analysis software supplied by Imbio, we were able to process CT scans for a unique, retrospective cohort of interstitial lung disease patients. This in turn allows us to submit reports on the cases analysed, evaluate the use of the software, and report on subsequent analysis that makes use of the generated data.
Collaborator Contribution Imbio has made their Lung Texture Analysis software available for our use, and provided technical support.
Impact The partnership has allowed for the generation of our quantitative CT dataset, using the Imbio software on data from our cohorts. Subsequently, we will be able to assess the utility of such automated software for retrospective medical image analysis, and use the resulting dataset for our research surrounding interstitial lung diseases. The partnership is multi-disciplinary and combines primarily Informatics and Medicine.
Start Year 2016
 
Description A Hands-On Introduction to Data Science in Health workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact This was an interactive workshop as part of the DataFest in Edinburgh, held as a 'fringe event'. It was open to the public. The aims were to provide an introduction to data science in healthcare through group discussions, case studies, presentations, and practical programming tasks.

My role as a workshop assistant, involved answering any questions about the material, and helping groups solve problems during the practical sessions. I also gave a presentation on my PhD project ('Computer-aided CT imaging and integration with molecular endotyping to stratify fibrotic lung disease'), as an example of data science in healthcare.

The event served to stimulate interest in the application of data science in healthcare settings, increased familiarity of data science challenges and opportunities for healthcare and biomedicine professionals, and helped to bridge the gap between disciplines.
Year(s) Of Engagement Activity 2019
 
Description Health Informatics in Actions workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Participant in Health Informatics in Action: Building a University-Wide Research Community. The aim was to bring together professionals across the Medical School and the School of Informatics to foster multidisciplinary collaboration opportunities.
Year(s) Of Engagement Activity 2018
 
Description Local presentation, Centre for Medical Informatics meeting (Edinburgh) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presentation on my research, titled "Stratifying fibrotic lung disease via molecular endotyping and integration with quantitative CT". This led to questions, discussion and constructive feedback from a cross-disciplinary audience of Principal Investigators and other researchers.
Year(s) Of Engagement Activity 2019
 
Description Local presentation, Inflammation & Immunity meeting (QMRI, Edinburgh) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Repeated presentations at the local Inflammation & Immunity meeting within the research centre I work at (QMRI, Edinburgh University). The audience consists of Principal Investigators and other researchers, mainly from the Centre for Inflammation Research (CIR). The outcomes are research dissemination and constructive discussion and evaluation of on-going research projects.

15/06/18: "RNA-seq analysis of alveolar macrophage subpopulations: preliminary results"

02/11/18: "Molecular endotyping and integration with quantitative CT to stratify fibrotic lung disease"

22/03/19: "Stratifying fibrotic lung disease: prognostic modelling and molecular endotyping"

06/03/20: "Predicting ILD diagnosis using molecular and quantitative CT data: preliminary exploration"
Year(s) Of Engagement Activity 2018,2019,2020
 
Description Medical student co-supervision 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact I was involved in the co-supervision of six (as of March 2020) medical student SSC (student selected component) projects. All the projects were data-based and involved datasets that are part of my research as well. My role included data extraction, analysis planning, and discussions. This served as a learning opportunity in cross-disciplinary collaboration, dissemination of my research, and exercises in the use of data and statistical or computational analysis for tackling medical questions.
Year(s) Of Engagement Activity 2018,2019,2020
 
Description One HealthTech: Evaluation of Digital and Data-driven Solutions in Healthcare 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Attendee at a workshop on "evaluation of digital and data-driven solutions in healthcare" (Edinburgh). This gave the opportunity to attend the presentations and discuss my research with a wide range of other attendees.
Year(s) Of Engagement Activity 2019
 
Description Poster presentation: Dealing with Data Conference 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presented a poster at the 2017 Dealing with Data Conference (Edinburgh University) on 'Challenges of Data Integration: Molecular, Imaging and Phenotypic Data". This led to questions and discussion surrounding my research topic and the challenges of working with multi-modal biomedical datasets.
Year(s) Of Engagement Activity 2017
 
Description Poster presentation: Futuristic Medicine Symposium, 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Presented a poster at the Futuristic Medicine Symposium (Edinburgh) on "Stratification of Fibrotic Lung Disease: Integration of Molecular Endotyping and Quantitative CT via Machine Learning". This led to questions and discussions about my research.
Year(s) Of Engagement Activity 2019
 
Description Precision Medicine Beyond Cancer Congress 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Attendee at Precision Medicine Beyond Cancer Congress 2018, Munich. Involved presentations and discussion and debate sessions.
Year(s) Of Engagement Activity 2018
 
Description Teaching Assistant 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact I was employed as a Teaching Assistant for the Edinburgh University Coursera MOOC (Massive Open Online Course), "Data Science in Stratified Healthcare and Precision Medicine". My role involved answering questions and leading discussions in the online discussion forum. I also participated in several YouTube live Hangout sessions where, together with the course organiser and other teaching staff, we discussed the course, answered questions, interviewed other researchers, and discussed recent developments related to data science in healthcare and precision medicine. In one of these sessions I also discussed my current research. This led to dissemination of my research, engagement with a broad international audience, and the creation of learning opportunities.
Year(s) Of Engagement Activity 2019
URL https://www.coursera.org/learn/datascimed