A Molecular Pathological Epidemiology Approach Towards Pancreatic Cancer

Lead Research Organisation: Queen Mary, University of London
Department Name: Barts Cancer Institute


Pancreatic Cancer (PaC) is projected to be one of the leading causes of cancer-related death by 2030, second only to lung cancer. PaC presents the poorest prognosis of all major solid tumours with a five-year survival rate of just 5%. The poor survival rate is largely associated with late-stage diagnosis when surgical resection, the only current hope for cure, becomes infeasible. Therefore, earlier prediction of PaC onset, progression and response to treatment may help to improve treatment strategy and patient outcomes.

In recent years, molecular pathological epidemiology (MPE) approach has contributed to the better understanding of several cancers and promises to revolutionise clinical practice through precision detection, prevention and treatment of cancers. By connecting putative etiological factors (commonly referred to as exposures or risk factors including treatments) to specific molecular signatures across tumour phenotypes, the MPE approach can yield more accurate measures regarding PaC diagnosis, prognosis and response to treatment.

Various studies have implied a number of potential risk factors for PaC, such as age, diabetes, smoking, alcohol and being overweight. The project aims to refine and analyse the utility of the known risk factors for PaC compared to other confounding commoner diagnosis. Then the project will focus at a deeper characterisation of PaC through integration of genetic data with information on environmental risk factors and clinical prognostic factors. Such understanding of how endogenous and exogenous risk factors are connected to molecular changes and how they contribute to the onset and progression of PaC is a critical step in informing evidence-based clinical practice. Further, the work has important practical implications, as this will provide a gene-environment interaction map in various pancreatic diseases to guide efforts for early diagnosis on PaC. The work can also provide evidence linking risk factors, treatment and disease outcomes, thereby supporting the precision medicine initiative. Similarly, identifying any connection between genetic make-up, medication and disease outcome will aid effort in potential drug repurposing.

Technical Summary

Objective 1:
A population-based comparative cohort study will be conducted between Pancreatic cancer (PaC) diagnosed and other patients matched by the appropriate demographics focusing on various epidemiological factors and clinical data such as demography, lifestyle, physiological , symptoms, diagnoses, medication, blood and urine tests and healthcare utilisation markers. Patients will be classified using data-driven phenotyping algorithm using their so-far-complete healthcare trajectory. Patients' linked electronic health records (EHR) including primary and secondary care data as well as socio-economic and mortality data will be utilised collected from diverse sources such as Barts Health Data Warehouse, NHS Discovery Project East London, CPRD and NCRAS. The PaC events (incidence, recurrence, death) and comparison of these to rates in matched controls will be estimated using appropriate statistical models. Association between various exposures and outcomes will be measured in terms of odds ratios, with length and dose of exposure to known risk factors as stratification categories.

Objective 2:
The -omics data available through cancer-specific initiatives (TCGA, GENIE) as well as molecular data repositories (GEO, ArrayExpress) will be analysed in combination with relevant literature mining to collate existing genomic characterisations of PaC in terms of gene/mutational signatures. The assessment of molecular characteristics of PaC associated to clinical history will be conducted on data derived from PCRF Tissue Bank. Consensus clustering algorithm will be applied to obtain new stratifications of PaC along the themes of disease onset, progression, survival and response to treatment. The association between the expression/mutational status of specific PaC genes and risk/prognostic factors, longitudinal data and response to treatment will be tested. All the data obtained through the project will be published in a mine-able web-based bioinformatics infrastructure.


10 25 50
Title HPB clinical dataset schema 
Description A MySQL-based relational database schema has been designed and implemented for organising the raw EHR data and data files. An event-oriented data model is deployed to capture a patient's clinical journey, where each patient is identified by a study-specific ID and each data point is represented by an tuple. The data points are accompanied by appropriate generic derived variables (e.g., occurrence of two events within a temporal range, age at diagnosis, age at first occurrence of symptoms, disease-free period, survival period from specific events) to add values. Much of the participant history is envisaged to be derived from the already-structured data in HES (Hospital Episode Statistics) and GP records. Such data coming from heterogenous sources requires standardisation. For example, most of the historical HES data uses ICD-9 or ICD-10 codes whereas GP data uses different versions of Read Codes (and SNOMED CT codes in near future) for presenting diagnosis and medical conditions, in which case these data are standardised into ICD-10 codes. Similarly, data received from different providers using different clinical terminologies are standardised into OPCS-4 codes for operative procedures, PBCL codes for pathology and dm+d codes for medicines. Ad-hoc mapping/translation rules are being prepared to achieve the standardisation. A fully-functional automated data processing workflow is under development to cleanse, harmonise and integrate clinical data obtained from multiple sources to derive a well-annotated, non-redundant master dataset. An alternative approach for standardisation will be considered where the clinical data derived from the heterogeneous clinical terminology systems will be mapped into all-encompassing SNOMED CT codes and terms. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? No  
Impact Ease of querying across different types of data (e.g., diagnosis, procedure, treatment, prescriptions) as a result of the underlying homogenised data model. 
Description Barts Health Commissioning Data Sets and Unstructured data for Pancreatic cancer epidemiology 
Organisation Barts Health NHS Trust
Country United Kingdom 
Sector Public 
PI Contribution The risk factors for PaC in multi-ethnic populations have not been well-defined. This study takes the unique opportunity to study PaC risk factors on a truly diverse multi-ethnic population of East London, whose secondary health care facilities are mostly covered by the hospitals within Barts Health NHS Trust. In particular, the Royal London Hospital has a renowned tertiary care establishment for hepato-pancreatico-biliary cancer, which is expected to provide the majority of the study participants. Barts Health is also linked research partner of the host institute (Quee Mary University of London), and the study is aligned with the strategic focus of the institute. The research is expected to identify potential triggers for targeted screening and speeding up the diagnosis. This is a critical step in informing evidence-based clinical practice for the target population at East London, both affected with PaC and at potential risk. The success of this single-site study will also open the door for conducting the study on a broader regional or national scale.
Collaborator Contribution Access to study participants: Given the high prevalence of Pancreatic Cancer in the tertiary care setting (1 in 20), the Trust have the advantage of providing appropriate control (positive and negative) and case-load for the research. We estimate ~250 new cases of PaC diagnosed each year on average within the estimated 2.5M population served by Barts Health. This gives an estimated ~3750 individual PaC patients over 15 years. A Control cohort of double that size will also be extrapolated from the population served by the Trust. Access to data: The Cerner Millennium System (CMS) Business Intelligence (BI) tool used within Barts Health NHS Trust is a useful resource to obtain the coded hospital CDS data. The CMS PowerChart tool can also be used to extract a limited set of unstructured data. Having an honorary researcher contract with the Trust, the PI has access to both the CMS BI and PowerChart tools. Study participants, ie., Cases and Controls, will be identified from the CDS data using the CMS BI tool. A subset of CDS data will then be extracted using the CMS BI tool to get the clinical history of the participants. A small set of unstructured hospital data will also be extracted using the CMS PowerChart tool.
Impact No outcome yet.
Start Year 2020
Description Discovery dataset for pancreatic cancer epidemiology 
Organisation East London Health and Care Partnership
Department NHS Discovery East London Programme
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution The Discovery East London is a partnership programme across the inner north-east London, aiming to establish, deliver and manage a secure data service with linked combined identifiable clinical data from all systems supporting direct health care in East London. The clinical dataset is designed to use data by third parties such as academics to support research. For the purposes of this research project, an application was submitted to the Discovery Programme Board in January 2019 for a subscription to the identifiable primary care data using Discovery. The application was provisionally approved in February 2019 for the provision of identifiable clinical data, in line with the Data Protection and Security protocols and governance principles applicable to the use of Discovery datasets. Upon receiving NHS CAG and HRA approval in Janaury 2020 for accessing relevant patients' identifiable clinical data without their consent under this project, Discovery provisional support has been extended to full support. The Discovery Programme Board will be supplied with a periodic update on the way that their data is being used, and the findings of the research. This will aid the Discovery Programme in building demonstrable evidence of how Discovery data has added value in the generation of research insights and ultimately clinical benefits in the health and care system.
Collaborator Contribution Discovery will provide a defined set of clinical data available for patients reported in the GP data (i.e, in East London CCGs) who have been diagnosed with pancreatic cancers and pancreatico-biliary diseases attending at Barts Health NHS Trust hospitals between 2007 and 2021. The clinical dataset requested include information on Basic Demography, Diagnosis, Procedures, Medical Conditions, Prescribed Medications, Lifestyle, Family History, Laboratory and Test Results. The project is expected to receive clinical data every five months in six batches between March 2020 and June 2022. The data will provide the basis for case-control study aiming towards discovering and evaluating novel and known risk factors associated with Pancreatic Cancer.
Impact No outcome yet.
Start Year 2019
Description Public-patient Involvement 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact Several platforms have been utilized to gather opinion from members of the public (including patients, family members of patients, and healthy volunteers) on this study and its use of patients' electronic health records without patient consent. These are:
1. Cancer Research UK (CRUK): Patient Involvement Opportunities
2. Pancreatic Cancer UK (PCUK): Research Involvement Network
3. NIHR Involve: People in Research (PiR)
4. Pancreatic Cancer Research Fund (PCRF)

The opportunity to share opinions was advertised in the relevant sections of the CRUK, PCUK and PiR websites over a month between September and October 2019. The opportunity was also advertised in the CRUK Newsletters emailed to it's registered subscribers. Interested members of the public were requested to contact the PI over the email to complete a survey. Upon request, the PI sent two documents to the PPI participants: A Plain English summary of the activity; and a survey form. In the case of PCRF, the CEO Ms Maggie Blanks sent out the request for the survey to a small group of registered PCRF supporters.

The summary document and survey form were prepared in consultation with the key collaborators (Prof Claude Chelala, Prof Hemant Kocher) and PCRF CEO Ms Maggie Blanks. The summary document outlined the research objectives and study method. It explained the rationale of conducting the study as well as using confidential healthcare records of patients without consent for this purpose. It also explained the need for specialised approvals from regulatory bodies to conduct the research. The survey was designed with a specific focus on measuring the acceptability of the use of confidential healthcare data without patient consent. Two specific questions were asked in the survey:
1. How do you feel about the use of electronic health records without patient consent in this research aiming to
improve pancreatic cancer management? Do the potential public benefits justify the approach?
2. What are your general concerns about the use of electronic health records in medical research? How do you think
those concerns can be addressed?

The proposed research was also presented in front of a small focus group during the PPI Research Advisory Group meeting at the CRUK Barts Centre (BCC) on 24th September 2019. The focus group consisted of 2 members of the public and 2 internal academics. Unfortunately, more members of the public could not join due to unexpected inclement weather on the day. The discussion was chaired by Dr Jessica Okosun. The same questions were asked to the two members of the public.

In the end, we received responses from 16 members of the public, representing patient, family members of patient and public groups from the viewpoint of this study.

14 out of 16 participants expressed their overwhelming support for the proposed research. They remarked the use of electronic health records without patient consent in this
research as a valid idea and urged to " use every possible available source of information to investigate early signs, correlations, causes, and outcomes". The consensus was that health records should be utilized for research to combat this deadly disease as long as the collected data becomes properly anonymised so that individuals can't be identified in future. Some participants implied that it would be an opportunity missed if more research is not conducted using past health records that exist anyway. A couple of participants went on to recommend broadening the scope of such research for other cancers as well as conducting at the national level to capture diverse demographic groups.

2 out of 16 participants raised concern about ensuring the privacy of patients against potential misuse of health records and stood for consent-based research only. One Patient Family representative stated that "Everything must be done [to improve pancreatic cancer] respecting patient privacy." The other participant represented Public group, who
pointed out that the advancement in technology means it is practically impossible to avoid data breach nowadays; It was, therefore, his opinion that Section 251 approval of accessing confidential data without subject's consent should not be in place for ANY research.

To summarise, we received a broad range of opinions from the survey questionnaires with some concerns raised but the overall responses were overwhelmingly in favour of using patient identifiable data in this project without patient consent.
Year(s) Of Engagement Activity 2019