A Molecular Pathological Epidemiology Approach Towards Pancreatic Cancer

Lead Research Organisation: Queen Mary University of London
Department Name: Barts Cancer Institute

Abstract

Pancreatic Cancer (PaC) is projected to be one of the leading causes of cancer-related death by 2030, second only to lung cancer. PaC presents the poorest prognosis of all major solid tumours with a five-year survival rate of just 5%. The poor survival rate is largely associated with late-stage diagnosis when surgical resection, the only current hope for cure, becomes infeasible. Therefore, earlier prediction of PaC onset, progression and response to treatment may help to improve treatment strategy and patient outcomes.

In recent years, molecular pathological epidemiology (MPE) approach has contributed to the better understanding of several cancers and promises to revolutionise clinical practice through precision detection, prevention and treatment of cancers. By connecting putative etiological factors (commonly referred to as exposures or risk factors including treatments) to specific molecular signatures across tumour phenotypes, the MPE approach can yield more accurate measures regarding PaC diagnosis, prognosis and response to treatment.

Various studies have implied a number of potential risk factors for PaC, such as age, diabetes, smoking, alcohol and being overweight. The project aims to refine and analyse the utility of the known risk factors for PaC compared to other confounding commoner diagnosis. Then the project will focus at a deeper characterisation of PaC through integration of genetic data with information on environmental risk factors and clinical prognostic factors. Such understanding of how endogenous and exogenous risk factors are connected to molecular changes and how they contribute to the onset and progression of PaC is a critical step in informing evidence-based clinical practice. Further, the work has important practical implications, as this will provide a gene-environment interaction map in various pancreatic diseases to guide efforts for early diagnosis on PaC. The work can also provide evidence linking risk factors, treatment and disease outcomes, thereby supporting the precision medicine initiative. Similarly, identifying any connection between genetic make-up, medication and disease outcome will aid effort in potential drug repurposing.

Technical Summary

Objective 1:
A population-based comparative cohort study will be conducted between Pancreatic cancer (PaC) diagnosed and other patients matched by the appropriate demographics focusing on various epidemiological factors and clinical data such as demography, lifestyle, physiological , symptoms, diagnoses, medication, blood and urine tests and healthcare utilisation markers. Patients will be classified using data-driven phenotyping algorithm using their so-far-complete healthcare trajectory. Patients' linked electronic health records (EHR) including primary and secondary care data as well as socio-economic and mortality data will be utilised collected from diverse sources such as Barts Health Data Warehouse, NHS Discovery Project East London, CPRD and NCRAS. The PaC events (incidence, recurrence, death) and comparison of these to rates in matched controls will be estimated using appropriate statistical models. Association between various exposures and outcomes will be measured in terms of odds ratios, with length and dose of exposure to known risk factors as stratification categories.

Objective 2:
The -omics data available through cancer-specific initiatives (TCGA, GENIE) as well as molecular data repositories (GEO, ArrayExpress) will be analysed in combination with relevant literature mining to collate existing genomic characterisations of PaC in terms of gene/mutational signatures. The assessment of molecular characteristics of PaC associated to clinical history will be conducted on data derived from PCRF Tissue Bank. Consensus clustering algorithm will be applied to obtain new stratifications of PaC along the themes of disease onset, progression, survival and response to treatment. The association between the expression/mutational status of specific PaC genes and risk/prognostic factors, longitudinal data and response to treatment will be tested. All the data obtained through the project will be published in a mine-able web-based bioinformatics infrastructure.
 
Description Sharing research materials
Geographic Reach Multiple continents/international 
Policy Influence Type Influenced training of practitioners or researchers
Impact The main impact would be for the other researchers in a similar field to reuse the developed materials for their own research, hence reducing duplication of effort.
URL https://pac-epidem-el.bcc.qmul.ac.uk/analysis/
 
Description Data-driven risk and prognosis prediction from linked electronic health care records: Pancreatic cancer as an exemplar
Amount £50,000 (GBP)
Funding ID MGU0555 
Organisation Barts Charity 
Sector Charity/Non Profit
Country United Kingdom
Start 05/2021 
End 05/2022
 
Description Data-driven risk and prognosis prediction from linked electronic health care records: Pancreatic cancer as an exemplar
Amount £50,000 (GBP)
Organisation Barts Health NHS Trust 
Sector Public
Country United Kingdom
Start 05/2021 
End 11/2022
 
Description LSI one-off award for consumables and travel
Amount £5,000 (GBP)
Funding ID MIMB1C1S 
Organisation Queen Mary University of London 
Sector Academic/University
Country United Kingdom
Start 07/2018 
End 02/2021
 
Title EHR based phenotyping algorithm development 
Description Rule-based phenotyping algorithms were developed to characterise patients, integrating information from multiple sources (where available) to counteract bias. Information from longitudinal electronic health records (EHR) data in various formats (e.g, ICD-10 codes, SNOMED CT codes, Read codes, CTV3 codes, structured and semi-structured text) are integrated together to derive a patient's phenotype for particular demographic or clinical attributes. The data categories for which phenotyping algorithms have been developed so far include: ethnicity, HPB diseases, common medical conditions (e.g, diabetes, hypertension, cardiovascular disease, respiratory disease, etc.), lifestyle factors (e.g., smoking, drinking, substance mis-use, obesity), and regular use of prescribed medication group. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Patient characteristics derived from applying the phenotyping algorithms have been used to conduct a study on the impact of COVID-19 on HPB patients as well as another study on the epidemiology of pancreatic cancer. 
URL https://pac-epidem-el.bcc.qmul.ac.uk/analysis/
 
Title Clinical data vocabulary and electronic phenotypes 
Description All materials developed so far to realise the project (EHR code mapping vocabulary, phenotyping rules) have been documented and visible through the dedicated project website. The actual implementation codes will be uploaded to QMUL Research IT Github resource to be reused by internal researchers, but can also be made available for external researchers upon request. 
Type Of Material Data analysis technique 
Year Produced 2021 
Provided To Others? Yes  
Impact Reducing duplication of effort for other researchers within the group. 
URL https://pac-epidem-el.bcc.qmul.ac.uk/analysis/
 
Title Early distinction of pancreatic cancer from non-malignant pancreatic diseases 
Description Using a machine learning approach, we developed a risk-prediction algorithm utilising patients' clinical features including demographic, comorbidity to distinguish patients with underlying pancreatic ductal adenocarcinoma from benign pancreatic patients. 
Type Of Material Computer model/algorithm 
Year Produced 2022 
Provided To Others? Yes  
Impact The prediction algorithm may become a useful adjunct for primary care physicians to decide on the appropriate use of urgent referral or imaging pathway for suspected pancreatic patients 
 
Title HPB clinical dataset schema 
Description A MySQL-based relational database schema has been designed and implemented for organising the raw EHR data and data files. An event-oriented data model is deployed to capture a patient's clinical journey, where each patient is identified by a study-specific ID and each data point is represented by an tuple. The data points are accompanied by appropriate generic derived variables (e.g., occurrence of two events within a temporal range, age at diagnosis, age at first occurrence of symptoms, disease-free period, survival period from specific events) to add values. Much of the participant history is envisaged to be derived from the already-structured data in HES (Hospital Episode Statistics) and GP records. Such data coming from heterogenous sources requires standardisation. For example, most of the historical HES data uses ICD-9 or ICD-10 codes whereas GP data uses different versions of Read Codes (and SNOMED CT codes in near future) for presenting diagnosis and medical conditions, in which case these data are standardised into ICD-10 codes. Similarly, data received from different providers using different clinical terminologies are standardised into OPCS-4 codes for operative procedures, PBCL codes for pathology and dm+d codes for medicines. Ad-hoc mapping/translation rules are being prepared to achieve the standardisation. A fully-functional automated data processing workflow has been developed to cleanse, harmonise and integrate clinical data obtained from multiple sources to derive a well-annotated, non-redundant master dataset. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? No  
Impact Ease of querying across different types of data (e.g., diagnosis, procedure, treatment, prescriptions) as a result of the underlying homogenised data model. 
URL https://pac-epidem-el.bcc.qmul.ac.uk/
 
Description Barts Health Commissioning Data Sets and Unstructured data for Pancreatic cancer epidemiology 
Organisation Barts Health NHS Trust
Country United Kingdom 
Sector Public 
PI Contribution The risk factors for PaC in multi-ethnic populations have not been well-defined. This study takes the unique opportunity to study PaC risk factors on a truly diverse multi-ethnic population of East London, whose secondary health care facilities are mostly covered by the hospitals within Barts Health NHS Trust. In particular, the Royal London Hospital has a renowned tertiary care establishment for hepato-pancreatico-biliary cancer, which is expected to provide the majority of the study participants. Barts Health is also linked research partner of the host institute (Quee Mary University of London), and the study is aligned with the strategic focus of the institute. The research is expected to identify potential triggers for targeted screening and speeding up the diagnosis. This is a critical step in informing evidence-based clinical practice for the target population at East London, both affected with PaC and at potential risk. The success of this single-site study will also open the door for conducting the study on a broader regional or national scale.
Collaborator Contribution Access to study participants: Given the high prevalence of Pancreatic Cancer in the tertiary care setting (1 in 20), the Trust have the advantage of providing appropriate control (positive and negative) and case-load for the research. We estimate ~250 new cases of PaC diagnosed each year on average within the estimated 2.5M population served by Barts Health. This gives an estimated ~3750 individual PaC patients over 15 years. A Control cohort of double that size will also be extrapolated from the population served by the Trust. Access to data: The Cerner Millennium System (CMS) Business Intelligence (BI) tool used within Barts Health NHS Trust is a useful resource to obtain the coded hospital CDS data. The CMS PowerChart tool can also be used to extract a limited set of unstructured data. Having an honorary researcher contract with the Trust, the PI has access to both the CMS BI and PowerChart tools. Study participants, ie., Cases and Controls, will be identified from the CDS data using the CMS BI tool. A subset of CDS data will then be extracted using the CMS BI tool to get the clinical history of the participants. A small set of unstructured hospital data will also be extracted using the CMS PowerChart tool.
Impact Two journal articles (see publications section).
Start Year 2020
 
Description Clinithink NLP 
Organisation Clinithink Limited
Country United States 
Sector Private 
PI Contribution 1. A use-case for the practical use of the Clinithink tool in cancer research. 2. Potentially enriching the Clinithink annotation library.
Collaborator Contribution 1. Provided access to Clinithink system 2. Provided technical support 3. Received advice on query writing and link to other researchers
Impact 1. Annotating tumour staging information of the study cohort from patients' histopathology reports.
Start Year 2022
 
Description Discovery dataset for pancreatic cancer epidemiology 
Organisation East London Health and Care Partnership
Department NHS Discovery East London Programme
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution The Discovery East London is a partnership programme across the inner north-east London, aiming to establish, deliver and manage a secure data service with linked combined identifiable clinical data from all systems supporting direct health care in East London. The clinical dataset is designed to use data by third parties such as academics to support research. For the purposes of this research project, an application was submitted to the Discovery Programme Board in January 2019 for a subscription to the identifiable primary care data using Discovery. The application was provisionally approved in February 2019 for the provision of identifiable clinical data, in line with the Data Protection and Security protocols and governance principles applicable to the use of Discovery datasets. Upon receiving NHS CAG and HRA approval in Janaury 2020 for accessing relevant patients' identifiable clinical data without their consent under this project, Discovery provisional support has been extended to full support. The Discovery Programme Board will be supplied with a periodic update on the way that their data is being used, and the findings of the research. This will aid the Discovery Programme in building demonstrable evidence of how Discovery data has added value in the generation of research insights and ultimately clinical benefits in the health and care system.
Collaborator Contribution Discovery will provide a defined set of clinical data available for patients reported in the GP data (i.e, in East London CCGs) who have been diagnosed with pancreatic cancers and pancreatico-biliary diseases attending at Barts Health NHS Trust hospitals between 2007 and 2021. The clinical dataset requested include information on Basic Demography, Diagnosis, Procedures, Medical Conditions, Prescribed Medications, Lifestyle, Family History, Laboratory and Test Results. The project is expected to receive clinical data every five months in six batches between March 2020 and June 2022. The data will provide the basis for case-control study aiming towards discovering and evaluating novel and known risk factors associated with Pancreatic Cancer.
Impact Two journal articles (see publications section).
Start Year 2019
 
Description Paublic-Patient Involvement 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Patients, carers and/or patient groups
Results and Impact We conducted surveys with patients (without cancer) visiting the Royal London Hospital Hepato-Pancreatico-Biliary (HPB) clinic; providing them with a survey material in pre-paid envelope so that they can complete the form according to their convenience and post it back to us. We started the activity in September 2021. Two specific questions were asked in the survey:
1. How do you feel about the use of electronic health records without patient consent in this research aiming to improve pancreatic cancer management? Do the potential public benefits justify the approach?
2. What are your general concerns about the use of electronic health records in medical research? How do you think those concerns can be addressed?

The Tissue Collection Officer for Pancreatic Cancer Research Fund (PCRF) Tissue Bank responsible for recruiting Tissue Bank participants from the Royal London Hospital HPB clinic, kindly agreed to identify suitable non-cancer patients visiting the HPB clinic and invited them for the survey. Another colleagues Dr Konstantinos Stasinos, currently working as an NIHR Academic Clinical Fellow at the Cambridge University Hospitals NHS Foundation Trust, also expressed interested to do the same with the patients at the Transplant / Hepatology / Medical Oncology Unit. Between September 2021 and January 2022, they collectively managed to distribute survey materials to 22 patients. By this date, we received 6 responses.

Responses from the survey question 1 were positive, provided the patient data are anonymised or obtained using appropriate approval. One participant provided an interesting observation that some patients, particularly from the ethnic minority groups, might refuse to associate themselves with medical research when asked in person; in that case, having a mechanism to utilise their records without consent is beneficial for research purpose. This is where an opt-out mechanism is necessary for those keen not to share their medical history for research purposes, a point made by another participant.

The main concern about the use of electronic health records in this research (question 2), as expressed by multiple participants, is ensuring the privacy/security of the personal data. One participant asked for assurance with explanation from the researcher that "personal data is not breached in any way". He also suggested undertaking regular security audits to ensure patients privacy and data anonymisation. In relation to the participant's second point about security audit: The host institute's IT department are ISO 27001 certified, as part of which they have to undertake yearly auditing. The security check of the overall computer network and the component data and application-hosting machines is a part of the annual auditing. The particular machine that hosts the data for this project is also part of data safe haven with additional physical. and virtual security. The dedicated project website has details on how patients' data are collected, stored and utilized, allowing them to provide any feedback, as well as options for opting out from the project. The posters distributed at the Royal London Hospital outpatients and wards also provide assurance on preserving privacy of the study participants.

One participant was aware of selling medical data to big pharmaceutical companies. It is already mentioned in the patient facing materials that patients' information will never be sold or passed on for commercial or marketing purpose. All of the current and future members of the study team will be under a legal obligation to the NHS UK and their data protection policies and procedures. No patient identifiable data will be transferred to or accessed by anyone outside the study team. The completely anonymised clinical dataset will also become an access-controlled resource within PCRF Tissue Bank for the use of future pancreatic cancer research. Finally, a philosophical question was posed by a participant that treatments potentially developed as an end-product of this kind of research (i.e., utilising data without their consent) should be made available to the target patient group at no extra cost.
Year(s) Of Engagement Activity 2021,2022
 
Description Presenting to health research community 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact CAP-AI is an artificial intelligence and machine learning programme run at Barts Life Sciences, and is London's first AI enabling programme focused on stimulating growth in the capital's AI cluster. Barts Life Sciences arranged an event showcase for the research fellows funded by the CAP-AI in 2021. The event featured presentations and demos from research teams and industry partners, followed by Q&A sessions. I have presented the summary of the findings generated from my project in this event.
Year(s) Of Engagement Activity 2021
 
Description Public-patient Involvement 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact Several platforms have been utilized to gather opinion from members of the public (including patients, family members of patients, and healthy volunteers) on this study and its use of patients' electronic health records without patient consent. These are:
1. Cancer Research UK (CRUK): Patient Involvement Opportunities
2. Pancreatic Cancer UK (PCUK): Research Involvement Network
3. NIHR Involve: People in Research (PiR)
4. Pancreatic Cancer Research Fund (PCRF)

The opportunity to share opinions was advertised in the relevant sections of the CRUK, PCUK and PiR websites over a month between September and October 2019. The opportunity was also advertised in the CRUK Newsletters emailed to it's registered subscribers. Interested members of the public were requested to contact the PI over the email to complete a survey. Upon request, the PI sent two documents to the PPI participants: A Plain English summary of the activity; and a survey form. In the case of PCRF, the CEO Ms Maggie Blanks sent out the request for the survey to a small group of registered PCRF supporters.

The summary document and survey form were prepared in consultation with the key collaborators (Prof Claude Chelala, Prof Hemant Kocher) and PCRF CEO Ms Maggie Blanks. The summary document outlined the research objectives and study method. It explained the rationale of conducting the study as well as using confidential healthcare records of patients without consent for this purpose. It also explained the need for specialised approvals from regulatory bodies to conduct the research. The survey was designed with a specific focus on measuring the acceptability of the use of confidential healthcare data without patient consent. Two specific questions were asked in the survey:
1. How do you feel about the use of electronic health records without patient consent in this research aiming to
improve pancreatic cancer management? Do the potential public benefits justify the approach?
2. What are your general concerns about the use of electronic health records in medical research? How do you think
those concerns can be addressed?

The proposed research was also presented in front of a small focus group during the PPI Research Advisory Group meeting at the CRUK Barts Centre (BCC) on 24th September 2019. The focus group consisted of 2 members of the public and 2 internal academics. Unfortunately, more members of the public could not join due to unexpected inclement weather on the day. The discussion was chaired by Dr Jessica Okosun. The same questions were asked to the two members of the public.

In the end, we received responses from 16 members of the public, representing patient, family members of patient and public groups from the viewpoint of this study.

14 out of 16 participants expressed their overwhelming support for the proposed research. They remarked the use of electronic health records without patient consent in this
research as a valid idea and urged to " use every possible available source of information to investigate early signs, correlations, causes, and outcomes". The consensus was that health records should be utilized for research to combat this deadly disease as long as the collected data becomes properly anonymised so that individuals can't be identified in future. Some participants implied that it would be an opportunity missed if more research is not conducted using past health records that exist anyway. A couple of participants went on to recommend broadening the scope of such research for other cancers as well as conducting at the national level to capture diverse demographic groups.

2 out of 16 participants raised concern about ensuring the privacy of patients against potential misuse of health records and stood for consent-based research only. One Patient Family representative stated that "Everything must be done [to improve pancreatic cancer] respecting patient privacy." The other participant represented Public group, who
pointed out that the advancement in technology means it is practically impossible to avoid data breach nowadays; It was, therefore, his opinion that Section 251 approval of accessing confidential data without subject's consent should not be in place for ANY research.

To summarise, we received a broad range of opinions from the survey questionnaires with some concerns raised but the overall responses were overwhelmingly in favour of using patient identifiable data in this project without patient consent.
Year(s) Of Engagement Activity 2019