Diagnosing Disease with Shopping Data

Lead Research Organisation: University of Nottingham
Department Name: Nottingham University Business School

Abstract

The aim:

To create a framework for "Personal Data Donation" by investigating the issues surrounding individuals "donating" personal transactional data to public health research projects.

The question:

How can personal transactional data be collected and analysed for the purposes of health research in a way that is acceptable to society, works for infectious and chronic disease, and can be successfully implemented in a clinical setting?

Overview:

This PhD is connected to a wider project by partners ALSPAC at Bristol University(2020) and the Alan Turing Institute(2020): "donating personal transactional data for research: investigating the public acceptability of using commercial transactional data in public health research".

Personal commercial transactional data is the information stored when an exchange occurs between an individual and a business, including customer shopping data. This research will connect loyalty card data (customer shopping information held by a retailer), to Covid-19 incidents and to information from women with ovarian cancer. Connecting these datasets will be used to investigate whether shopping data can be used to get women with ovarian cancer diagnosed earlier, and/or if it can help in informing public health decisions in a pandemic.

A collection of studies will be done to iteratively create machine learning (ML) models (a method of programming computers to learn from data) whose predictions could help in the earlier diagnosis of ovarian cancer and/or the understanding of ILI (Influenza Like Illnesses) outbreaks.

The methodology to be used is mixed methods collecting and analysing both qualitative data, and quantitative data for integrated interpretation. The studies will be used to inform the models schema creation, feature engineering, to understand, and validate its outputs and any interpretations made from these. The iterative design will allow for adjustments to the model for successful implementation in a clinical setting.

The survival rate for ovarian cancer is low, with no UK national screening programme women are predominantly diagnosed in the late stages (Cancer Research UK 2020), and the world is currently experiencing a pandemic of Covid-19 (WHO 2020). Creating a framework tool, using this research, will help medical researchers assess, and access, the potential of using shopping data to investigate disease.

Planned Impact

We will collaborate with over 40 partners drawn from across FMCG and Food; Creative Industries; Health and Wellbeing; Smart Mobility; Finance; Enabling technologies; and Policy, Law and Society. These will benefit from engagement with our CDT through the following established mechanisms:

- Training multi-disciplinary leaders. Our partners will benefit from being able to recruit highly skilled individuals who are able to work across technologies, methods and sectors and in multi-disciplinary teams. We will deliver at least 65 skilled PhD graduates into the Digital Economy.

- Internships. Each Horizon student undertakes at least one industry internship or exchange at an external partner. These internships have a benefit to the student in developing their appreciation of the relevance of their PhD to the external societal and industrial context, and have a benefit to the external partner through engagement with our students and their multidisciplinary skill sets combined with an ability to help innovate new ideas and approaches with minimal long-term risk. Internships are a compulsory part of our programme, taking place in the summer of the first year. We will deliver at least 65 internships with partners.

- Industry-led challenge projects. Each student participates in an industry-led group project in their second year. Our partners benefit from being able to commission focused research projects to help them answer a challenge that they could not normally fund from their core resources. We will deliver at least 15 such projects (3 a year) throughout the lifetime of the CDT.

- Industry-relevant PhD projects. Each student delivers a PhD thesis project in collaboration with at least one external partner who benefits from being able to engage in longer-term and deeper research that they would not normally be able to undertake, especially for those who do not have their own dedicated R&D labs. We will deliver at least 65 such PhDs over the lifetime of this CDT renewal.

- Public engagement. All students receive training in public engagement and learn to communicate their findings through press releases, media coverage.

This proposal introduces two new impact channels in order to further the impact of our students' work and help widen our network of partners.

- The Horizon Impact Fund. Final year students can apply for support to undertake short impact projects. This benefits industry partners, public and third sector partners, academic partners and the wider public benefit from targeted activities that deepen the impact of individual students' PhD work. This will support activities such as developing plans for spin-outs and commercialization; establishing an IP position; preparing and documenting open-source software or datasets; and developing tourable public experiences.

- ORBIT as an impact partner for RRI. Students will embed findings and methods for Responsible Research Innovation into the national training programme that is delivered by ORBIT, the Observatory for Responsible Research and Innovation in ICT (www.orbit-rri.org). Through our direct partnership with ORBIT all Horizon CDT students will be encouraged to write up their experience of RRI as contributions to ORBIT so as to ensure that their PhD research will not only gain visibility but also inform future RRI training and education. PhD projects that are predominantly in the area of RRI are expected to contribute to new training modules, online tools or other ORBIT services.

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023305/1 01/10/2019 31/03/2028
2274214 Studentship EP/S023305/1 01/10/2019 30/12/2023 Elizabeth Dolan
 
Description Avon Longitudinal Study of Parents and Children PhD Partner 
Organisation University of Bristol
Department Avon Longitudinal Study of Parents and Children (ALSPAC)
Country United Kingdom 
Sector Academic/University 
PI Contribution I have worked on collaborative research to connect and aid my PhD partner's project "Donating personal transactional data for research: Investigating the public acceptability of using commercial transactional data in public health research". https://www.turing.ac.uk/research/research-projects/donating-personal-transactional-data-research I work with the PI and research team on this project. I have completed an internship with ALSPAC where I conducted interviews and a themed analysis of the interview data collected. One of the outputs from this internship was the paper currently under review for publication "Public attitudes towards sharing loyalty card data for academic health research: a qualitative study". I am currently working with the PI on two research studies investigating collecting individual loyalty card data for health research for ovarian cancer and covid-19. I meet with the research team on a weekly basis to work collaboratively and share expertise and knowledge.
Collaborator Contribution Opportunity for an internship with the organisation. Expertise from experienced researchers, shared knowledge from and opportunity to network with researchers working within the field, and collaborative research opportunities. Supervisory support from the PI.
Impact Paper currently under review for publication "Public attitudes towards sharing loyalty card data for academic health research: a qualitative study"
Start Year 2019
 
Description Cancer Loyalty Card Study (CLOCS) 
Organisation Imperial College London
Department Department of Surgery and Cancer
Country United Kingdom 
Sector Academic/University 
PI Contribution I am conducting a machine learning analysis of the loyalty card data collected by the CLOCS research team from those with and without ovarian cancer to investigate the use of self-medication and shopping habits prior to a diagnosis of ovarian cancer.
Collaborator Contribution Opportunity to join the CLOCS research team and learn from their expertise in cancer genetics and epigenetics, epidemiology and behavioural psychology, and from their ongoing work on CLOCS. Access to data collected as part of the CLOCS study. Opportunity to join Imperial London College as an occasional student.
Impact 9 December - Cancer Loyalty Card Study (CLOCS) Annual Meeting 2021, I presented: CLOCS Machine learning analysis plans.
Start Year 2021
 
Description NHSX Internship and Project: Value of Commercial Product Sales Data in Healthcare Prediction 
Organisation NHS England
Country United Kingdom 
Sector Public 
PI Contribution Using and sharing expertise and intellectual input for joint research project. Informing and presenting to NHS data analyst staff. Ability, due to unique access, to link commercial datasets to healthcare datasets.
Collaborator Contribution Shared healthcare and NHS expertise for research project. Provided access and expertise on healthcare datasets. Provided project support through regular meetings, feedback and practical/technical work support with/from NHS staff.
Impact Technical report and open-source code on project: Value of Commercial Product Sales Data in Healthcare Prediction
Start Year 2021
 
Description The Alan Turing Institute Special Interest Group Novel data linkages for health and wellbeing 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution I have become a co-organiser of The Alan Turing Institute Special Interest Group Novel data linkages for health and wellbeing, though my PhD partner main contact Anya Skatova who is a Turing Fellow. I have helped arrange and facilitate a one day workshop at the Alan Turing Institute in London of stakeholders from academia, industry, healthcare, funders and government bodies, and write-up a post-event report. I am also helping arrange further events including a digital footprint conference on May 11th 2023.
Collaborator Contribution Access to expertise and discussions from a range of researchers, industry, government and third sector members on my PhD topic. Access to facilitates, exposure and networks of the Alan Turing Institute.
Impact Report on the Inaugural Novel Data Linkages for Health and Wellbeing Special Interest Group event, 26th of October 2022 February 2023 https://digifootprints.co.uk/wp-content/uploads/2023/02/Novel-Data-Linkages-SIG-event-report-FEB2023.pdf
Start Year 2022
 
Description 21 November 2021 - Ovacome Webinar to women with Ovarian Cancer on using shopping data to explore diagnosis and donating shopping data. Ovarian cancer, misdiagnosis and shopping for healthcare products, with Lizzie Dolan 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact 10 women with ovarian cancer attended my webinar on the results of my work and further planned work on using shopping data to investigate the diagnosis of ovarian cancer, the event was also attended by third sector workers. The webinar is now available through the charity Ovacome's YouTube site with currently 73 views.
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=XWB5kakhyBc
 
Description Blog with NHSX: Model Class Reliance for Demonstrating Variable Importance 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact My research work on using the MCR variable importance tool was explained in a blog by the NHSX Analytics Unit. I helped create the blog which is based on my report for the NHSX on the Value of Commercial Sales Data for Health Predictions. The blog is published on the NHSX AU's GitHub page.
Year(s) Of Engagement Activity 2022
URL https://nhsx.github.io/AnalyticsUnit/MCR.html
 
Description February 2023 Turing-Roche Knowledge Share Event: Digital Health. Online webinar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact One of a series of knowledge share series aiming to bring together members of Roche (a large biotech company, and leading provider of in-vitro diagnostics and innovative solutions across major disease areas https://www.roche.com/about/) and The Alan Turing Institute's networks (the UK's national institute for data science and artificial intelligence https://www.turing.ac.uk) as well as the wider scientific community, to showcase partnership updates and research, knowledge share and hear different academic and industry perspectives on data science topics to gain insight and help build new connections and collaborations (https://www.turing.ac.uk/events/turing-roche-knowledge-share-series).

Reached around 150 people with 80-90 viewing live and 75 views on YouTube to date. The event was on the theme of Digital Health, exploring how the increasing amount of collected 'footprint' data can be used to develop healthcare research and products and considerations around this. I gave a talk introducing the Turing Special Interest Group (SIG) Novel Data Linkages for Health and Wellbeing (https://www.turing.ac.uk/research/interest-groups/novel-data-linkages-health-and-wellbeing) and how the groups work applies to my own research in using AI in population health surveillance through digital footprint data. I addressed how three themes emerging from the work SIG is doing bringing multidisciplinary and multi sector communities together to discuss linking novel digital footprint data to health outcomes applied to my own work. These themes were:
• The value to policy makers & healthcare organisations
• Public acceptability
• Industry as data providers
I demonstrated how they applied to my studies Assessing the value of integrating national longitudinal shopping data into respiratory disease forecasting models (https://www.researchsquare.com/article/rs-2226531/v1), Public attitudes towards sharing loyalty card data for academic health research: a qualitative study(https://bmcmedethics.biomedcentral.com/articles/10.1186/s12910-022-00795-8) and a third study which I am currently writing up Using shopping data to predict respiratory disease and COVID-19: CIDS (Covid Individual Data Study).

My presentation generated a lot of interest and a lot of questions, many of which had to be followed up in the Roche-Turing slack challenge post-event due to time restraints. There was a lot of discussion around the techniques to increase inclusion of wider audiences in medical research, data collection and wrangling, and public acceptance of using these 'novel' data types.
Year(s) Of Engagement Activity 2023
URL http://www.turing.ac.uk/events/turing-roche-knowledge-share-series-digital-health
 
Description Magazine article on my research with Ovarian Cancer Charity "Ovacome helps find loyalty card potential" 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact A magazine article in Ovacome's quarterly magazine to update their community on the results of my survey study Ovacome supported me with. The survey study investigated whether shopping purchases were related to the pathway of diagnosis of ovarian cancer. See page 3 in electronic version of magazine.
The magazine is also printed and sent to Ovacome members.
Year(s) Of Engagement Activity 2021
URL https://www.ovacome.org.uk/Handlers/Download.ashx?IDMF=0d3cfb9d-7ba0-4073-bce5-6df56c697052
 
Description N/LAB website page: Diagnosing Disease with Shopping Data 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Webpage on my PhD which explains my project, and updates the general public with outcomes from the PhD
Year(s) Of Engagement Activity 2020,2021,2022
URL https://www.nlab.org.uk/project/shopping-data-disease/
 
Description Presentation at the UKRI Trustworthy Autonomous Systems (TAS) hub showcase day hosted by the MINDS CDT at Southampton 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Gave a presentation of my work at the UKRI Trustworthy Autonomous Systems (TAS) hub showcase day hosted by the MINDS CDT at Southampton on The Value Of Using Shopping Data To Make Predictions For Deaths From Respiratory Disease: Using Model Class Reliance for AI Explainability. Over 100 post-graduate PhD students whose centres of doctoral training are all linked to the UKRI Trustworthy Autonomous Systems (TAS) hub showcase day, also displayed a poster at the event.
Year(s) Of Engagement Activity 2022
URL https://tas.ac.uk