Computer-aided CT imaging and integration with molecular endotyping to stratify fibrotic lung disease
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Molecular. Genetics & Pop Health
Abstract
Background:
Interstitial lung diseases (ILDs) account for approximately 7000 deaths in the UK every year, with idiopathic pulmonary fibrosis (IPF) identified as the most common diagnosis. Classification of ILDs is based upon causation, high resolution (HR) CT scan appearance and lung histology, which requires surgical biopsy.
This is problematic for numerous reasons, namely: 1. There is strong reliance on lung biopsy, an invasive procedure with a 2-7% mortality that many patients will not undergo, leading to a diagnosis of "unclassifiable disease"; 2. The current methods do not reliably inform of either prognosis or treatment efficacy; 3. Clinical reporting of CT's is subjective and not quantitative.
Lung fibrosis is notable for lacking definitive tools to achieve diagnostic precision, resulting in highly heterogenous disease entities.
Recent work:
CT texture analysis platforms, such as the Adaptive Multiple Features Method (AMFM) and the Computer-Aided Lung Informatics for Pathology Evaluation and Rating (CALIPER), have previously been studied with applications in clinical settings. However, these have not been validated in longitudinal cohorts in which ground truth (survival, time to hospitalisation, rate of decline in lung function, response to treatment) is known.
Over the past few years, advances in diagnostic and prognostic biomarker and genetic profiling in lung fibrosis have been made. Some of these have been validated in several cohorts of patients. However, the vast majority of these studies are confined to IPF. Additionally, the largest studies are based on patients recruited to clinical trials and not 'real-world' subjects.
Resources:
Data is available from established gene-, bio- and image-banks, and a unique, ethically approved, prospectively populated database designed to depict the natural history of lung fibrosis. The cohort consists of >1100 consecutively presenting consented patients with lung fibrosis since 2002, with less than 1% lost to follow-up. All patients have CT scans, and more than 800 patients have serial scans. CT scans are hosted within National Services Scotland (NSS) and this is co-located with the Farr network in the Edinburgh Farr node, enabling a safe haven analytic environment for imaging, clinical and 'omic data'.
Furthermore, serum and genomic DNA samples are available from the majority of subjects from the cohort, along with a complete dataset of variables including disease phenotype according to clinical-, CT-, biopsy-category, serial lung function.
Aims:
To integrate known and novel biomarkers, genetic polymorphisms and quantitative CT imaging (radiogenomics) such that these data can be effectively interrogated through machine learning approaches to define clinically meaningful clusters of disease. The aim is to determine homogenous subgroups that better define patient prognosis and response to therapy.
Preliminary programme of work:
1. Identify serum biomarkers, which may effectively discriminate between progressors and non-progressors.
2. Genotyping: perform analysis of GWAS and RNA-seq datasets.
3. Quantitative CT analysis with the CALIPER texture analysis platform. Validate the platform on our datasets. Develop and test an interactive protocol for classification of scans into diagnostic groups.
4. Integration and interrogation of molecular, imaging and phenotypical data such that analyses can be performed for the stratification of disease.
Ultimately, an automated and assistive tool would be developed for personalised predictions of diagnosis, prognosis, rate of decline and response to treatment in lung fibrosis, based on a diverse set of pre-defined variables.
Interstitial lung diseases (ILDs) account for approximately 7000 deaths in the UK every year, with idiopathic pulmonary fibrosis (IPF) identified as the most common diagnosis. Classification of ILDs is based upon causation, high resolution (HR) CT scan appearance and lung histology, which requires surgical biopsy.
This is problematic for numerous reasons, namely: 1. There is strong reliance on lung biopsy, an invasive procedure with a 2-7% mortality that many patients will not undergo, leading to a diagnosis of "unclassifiable disease"; 2. The current methods do not reliably inform of either prognosis or treatment efficacy; 3. Clinical reporting of CT's is subjective and not quantitative.
Lung fibrosis is notable for lacking definitive tools to achieve diagnostic precision, resulting in highly heterogenous disease entities.
Recent work:
CT texture analysis platforms, such as the Adaptive Multiple Features Method (AMFM) and the Computer-Aided Lung Informatics for Pathology Evaluation and Rating (CALIPER), have previously been studied with applications in clinical settings. However, these have not been validated in longitudinal cohorts in which ground truth (survival, time to hospitalisation, rate of decline in lung function, response to treatment) is known.
Over the past few years, advances in diagnostic and prognostic biomarker and genetic profiling in lung fibrosis have been made. Some of these have been validated in several cohorts of patients. However, the vast majority of these studies are confined to IPF. Additionally, the largest studies are based on patients recruited to clinical trials and not 'real-world' subjects.
Resources:
Data is available from established gene-, bio- and image-banks, and a unique, ethically approved, prospectively populated database designed to depict the natural history of lung fibrosis. The cohort consists of >1100 consecutively presenting consented patients with lung fibrosis since 2002, with less than 1% lost to follow-up. All patients have CT scans, and more than 800 patients have serial scans. CT scans are hosted within National Services Scotland (NSS) and this is co-located with the Farr network in the Edinburgh Farr node, enabling a safe haven analytic environment for imaging, clinical and 'omic data'.
Furthermore, serum and genomic DNA samples are available from the majority of subjects from the cohort, along with a complete dataset of variables including disease phenotype according to clinical-, CT-, biopsy-category, serial lung function.
Aims:
To integrate known and novel biomarkers, genetic polymorphisms and quantitative CT imaging (radiogenomics) such that these data can be effectively interrogated through machine learning approaches to define clinically meaningful clusters of disease. The aim is to determine homogenous subgroups that better define patient prognosis and response to therapy.
Preliminary programme of work:
1. Identify serum biomarkers, which may effectively discriminate between progressors and non-progressors.
2. Genotyping: perform analysis of GWAS and RNA-seq datasets.
3. Quantitative CT analysis with the CALIPER texture analysis platform. Validate the platform on our datasets. Develop and test an interactive protocol for classification of scans into diagnostic groups.
4. Integration and interrogation of molecular, imaging and phenotypical data such that analyses can be performed for the stratification of disease.
Ultimately, an automated and assistive tool would be developed for personalised predictions of diagnosis, prognosis, rate of decline and response to treatment in lung fibrosis, based on a diverse set of pre-defined variables.
Publications
Mills R
(2019)
Intrapulmonary Autoantibodies to HSP72 Are Associated with Improved Outcomes in IPF.
in Journal of immunology research
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
MR/N013166/1 | 30/09/2016 | 29/09/2025 | |||
1940067 | Studentship | MR/N013166/1 | 31/08/2017 | 29/06/2021 | Alexander Przybylski |
Title | Imbio Quantitative CT Dataset |
Description | Serial chest CT scans for all interstitial lung disease (ILD) subjects in our subcohort (those with molecular Luminex measurements, N>650), were processed by Imbio Lung Texture Analysis software, v1.3.3 (Imbio, Minneapolis, MN). For each CT scan, the output involves the segmentation of the lungs into six regions, with voxels classified into one of five textures (lung parenchymal patterns). The textures are: 'Normal', 'Hyperlucent', 'Ground Glass', 'Reticular', 'Honeycomb'. Our inclusion criteria permitted a range of different scans for processing, with multiple time-points for many patients. In total, over 1700 scans were passed through our pipeline. |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | No |
Impact | With our diverse set of CT scans processed, we outline inclusion and exclusion criteria necessary for subsequent data analysis. Our study thus shines light on the feasibility of using a retrospective cohort for quantitative CT scan processing, and some of the limitations and challenges for such software applications. This dataset forms the basis of the quantitative imaging component of my research project, part of the larger aim of an integrative and multi-modal data approach. All subsequent analysis will involve this dataset, with the overarching aims of predictive modeling of ILD patient outcomes, prognostication, and subtype discovery. |
URL | https://www.imbio.com/lung-texture-analysis |
Title | Luminex Molecular Dataset |
Description | All interstitial lung disease (ILD) subjects in our cohort with a banked serum sample were selected for molecular assays. This subcohort spans over 650 subjects, in a range of ILD diagnosis groups, with the largest being Idiopathic Pulmonary Fibrosis. We identified a set of 61 analytes of interest and for each subject serum sample, performed the assays via Luminex technology. |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | No |
Impact | This dataset forms the basis of the molecular component of my research project, part of the larger aim of an integrative and multi-modal data approach. All main subsequent analysis carried out involves this dataset, with the overarching aims of identifying biomarkers that associate with important clinical patient outcomes (such as survival, lung function decline, hospitalisation) and the identification of potential novel patient subgroups. More specifically for example, prognostic modelling has been carried out using the molecular measurements as covariates alongside clinical variables. The dataset has also raised further research questions, particularly surrounding missing and censored data. This has led to research on technical, machine learning-based approaches for the imputation of such values. All future analysis will use these molecular variables as covariates in any models developed. |
URL | https://clinicaltrials.gov/ct2/show/study/NCT04016181 |
Description | Imbio partnership |
Organisation | Imbio, LLC |
Country | United States |
Sector | Private |
PI Contribution | Using the Lung Texture Analysis software supplied by Imbio, we were able to process CT scans for a unique, retrospective cohort of interstitial lung disease patients. This in turn allows us to submit reports on the cases analysed, evaluate the use of the software, and report on subsequent analysis that makes use of the generated data. |
Collaborator Contribution | Imbio has made their Lung Texture Analysis software available for our use, and provided technical support. |
Impact | The partnership has allowed for the generation of our quantitative CT dataset, using the Imbio software on data from our cohorts. Subsequently, we will be able to assess the utility of such automated software for retrospective medical image analysis, and use the resulting dataset for our research surrounding interstitial lung diseases. The partnership is multi-disciplinary and combines primarily Informatics and Medicine. |
Start Year | 2016 |
Description | A Hands-On Introduction to Data Science in Health workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | This was an interactive workshop as part of the DataFest in Edinburgh, held as a 'fringe event'. It was open to the public. The aims were to provide an introduction to data science in healthcare through group discussions, case studies, presentations, and practical programming tasks. My role as a workshop assistant, involved answering any questions about the material, and helping groups solve problems during the practical sessions. I also gave a presentation on my PhD project ('Computer-aided CT imaging and integration with molecular endotyping to stratify fibrotic lung disease'), as an example of data science in healthcare. The event served to stimulate interest in the application of data science in healthcare settings, increased familiarity of data science challenges and opportunities for healthcare and biomedicine professionals, and helped to bridge the gap between disciplines. |
Year(s) Of Engagement Activity | 2019 |
Description | Health Informatics in Actions workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Participant in Health Informatics in Action: Building a University-Wide Research Community. The aim was to bring together professionals across the Medical School and the School of Informatics to foster multidisciplinary collaboration opportunities. |
Year(s) Of Engagement Activity | 2018 |
Description | Local presentation, Centre for Medical Informatics meeting (Edinburgh) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation on my research, titled "Stratifying fibrotic lung disease via molecular endotyping and integration with quantitative CT". This led to questions, discussion and constructive feedback from a cross-disciplinary audience of Principal Investigators and other researchers. |
Year(s) Of Engagement Activity | 2019 |
Description | Local presentation, Inflammation & Immunity meeting (QMRI, Edinburgh) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Repeated presentations at the local Inflammation & Immunity meeting within the research centre I work at (QMRI, Edinburgh University). The audience consists of Principal Investigators and other researchers, mainly from the Centre for Inflammation Research (CIR). The outcomes are research dissemination and constructive discussion and evaluation of on-going research projects. 15/06/18: "RNA-seq analysis of alveolar macrophage subpopulations: preliminary results" 02/11/18: "Molecular endotyping and integration with quantitative CT to stratify fibrotic lung disease" 22/03/19: "Stratifying fibrotic lung disease: prognostic modelling and molecular endotyping" 06/03/20: "Predicting ILD diagnosis using molecular and quantitative CT data: preliminary exploration" |
Year(s) Of Engagement Activity | 2018,2019,2020 |
Description | Medical student co-supervision |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Undergraduate students |
Results and Impact | I was involved in the co-supervision of six (as of March 2020) medical student SSC (student selected component) projects. All the projects were data-based and involved datasets that are part of my research as well. My role included data extraction, analysis planning, and discussions. This served as a learning opportunity in cross-disciplinary collaboration, dissemination of my research, and exercises in the use of data and statistical or computational analysis for tackling medical questions. |
Year(s) Of Engagement Activity | 2018,2019,2020 |
Description | One HealthTech: Evaluation of Digital and Data-driven Solutions in Healthcare |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | Attendee at a workshop on "evaluation of digital and data-driven solutions in healthcare" (Edinburgh). This gave the opportunity to attend the presentations and discuss my research with a wide range of other attendees. |
Year(s) Of Engagement Activity | 2019 |
Description | Poster presentation: Dealing with Data Conference 2017 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Presented a poster at the 2017 Dealing with Data Conference (Edinburgh University) on 'Challenges of Data Integration: Molecular, Imaging and Phenotypic Data". This led to questions and discussion surrounding my research topic and the challenges of working with multi-modal biomedical datasets. |
Year(s) Of Engagement Activity | 2017 |
Description | Poster presentation: Futuristic Medicine Symposium, 2019 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | Presented a poster at the Futuristic Medicine Symposium (Edinburgh) on "Stratification of Fibrotic Lung Disease: Integration of Molecular Endotyping and Quantitative CT via Machine Learning". This led to questions and discussions about my research. |
Year(s) Of Engagement Activity | 2019 |
Description | Precision Medicine Beyond Cancer Congress 2018 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Attendee at Precision Medicine Beyond Cancer Congress 2018, Munich. Involved presentations and discussion and debate sessions. |
Year(s) Of Engagement Activity | 2018 |
Description | Teaching Assistant |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | I was employed as a Teaching Assistant for the Edinburgh University Coursera MOOC (Massive Open Online Course), "Data Science in Stratified Healthcare and Precision Medicine". My role involved answering questions and leading discussions in the online discussion forum. I also participated in several YouTube live Hangout sessions where, together with the course organiser and other teaching staff, we discussed the course, answered questions, interviewed other researchers, and discussed recent developments related to data science in healthcare and precision medicine. In one of these sessions I also discussed my current research. This led to dissemination of my research, engagement with a broad international audience, and the creation of learning opportunities. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.coursera.org/learn/datascimed |