Diagnosing Disease with Shopping Data
Lead Research Organisation:
University of Nottingham
Department Name: Nottingham University Business School
Abstract
The aim:
To create a framework for "Personal Data Donation" by investigating the issues surrounding individuals "donating" personal transactional data to public health research projects.
The question:
How can personal transactional data be collected and analysed for the purposes of health research in a way that is acceptable to society, works for infectious and chronic disease, and can be successfully implemented in a clinical setting?
Overview:
This PhD is connected to a wider project by partners ALSPAC at Bristol University(2020) and the Alan Turing Institute(2020): "donating personal transactional data for research: investigating the public acceptability of using commercial transactional data in public health research".
Personal commercial transactional data is the information stored when an exchange occurs between an individual and a business, including customer shopping data. This research will connect loyalty card data (customer shopping information held by a retailer), to Covid-19 incidents and to information from women with ovarian cancer. Connecting these datasets will be used to investigate whether shopping data can be used to get women with ovarian cancer diagnosed earlier, and/or if it can help in informing public health decisions in a pandemic.
A collection of studies will be done to iteratively create machine learning (ML) models (a method of programming computers to learn from data) whose predictions could help in the earlier diagnosis of ovarian cancer and/or the understanding of ILI (Influenza Like Illnesses) outbreaks.
The methodology to be used is mixed methods collecting and analysing both qualitative data, and quantitative data for integrated interpretation. The studies will be used to inform the models schema creation, feature engineering, to understand, and validate its outputs and any interpretations made from these. The iterative design will allow for adjustments to the model for successful implementation in a clinical setting.
The survival rate for ovarian cancer is low, with no UK national screening programme women are predominantly diagnosed in the late stages (Cancer Research UK 2020), and the world is currently experiencing a pandemic of Covid-19 (WHO 2020). Creating a framework tool, using this research, will help medical researchers assess, and access, the potential of using shopping data to investigate disease.
To create a framework for "Personal Data Donation" by investigating the issues surrounding individuals "donating" personal transactional data to public health research projects.
The question:
How can personal transactional data be collected and analysed for the purposes of health research in a way that is acceptable to society, works for infectious and chronic disease, and can be successfully implemented in a clinical setting?
Overview:
This PhD is connected to a wider project by partners ALSPAC at Bristol University(2020) and the Alan Turing Institute(2020): "donating personal transactional data for research: investigating the public acceptability of using commercial transactional data in public health research".
Personal commercial transactional data is the information stored when an exchange occurs between an individual and a business, including customer shopping data. This research will connect loyalty card data (customer shopping information held by a retailer), to Covid-19 incidents and to information from women with ovarian cancer. Connecting these datasets will be used to investigate whether shopping data can be used to get women with ovarian cancer diagnosed earlier, and/or if it can help in informing public health decisions in a pandemic.
A collection of studies will be done to iteratively create machine learning (ML) models (a method of programming computers to learn from data) whose predictions could help in the earlier diagnosis of ovarian cancer and/or the understanding of ILI (Influenza Like Illnesses) outbreaks.
The methodology to be used is mixed methods collecting and analysing both qualitative data, and quantitative data for integrated interpretation. The studies will be used to inform the models schema creation, feature engineering, to understand, and validate its outputs and any interpretations made from these. The iterative design will allow for adjustments to the model for successful implementation in a clinical setting.
The survival rate for ovarian cancer is low, with no UK national screening programme women are predominantly diagnosed in the late stages (Cancer Research UK 2020), and the world is currently experiencing a pandemic of Covid-19 (WHO 2020). Creating a framework tool, using this research, will help medical researchers assess, and access, the potential of using shopping data to investigate disease.
Planned Impact
We will collaborate with over 40 partners drawn from across FMCG and Food; Creative Industries; Health and Wellbeing; Smart Mobility; Finance; Enabling technologies; and Policy, Law and Society. These will benefit from engagement with our CDT through the following established mechanisms:
- Training multi-disciplinary leaders. Our partners will benefit from being able to recruit highly skilled individuals who are able to work across technologies, methods and sectors and in multi-disciplinary teams. We will deliver at least 65 skilled PhD graduates into the Digital Economy.
- Internships. Each Horizon student undertakes at least one industry internship or exchange at an external partner. These internships have a benefit to the student in developing their appreciation of the relevance of their PhD to the external societal and industrial context, and have a benefit to the external partner through engagement with our students and their multidisciplinary skill sets combined with an ability to help innovate new ideas and approaches with minimal long-term risk. Internships are a compulsory part of our programme, taking place in the summer of the first year. We will deliver at least 65 internships with partners.
- Industry-led challenge projects. Each student participates in an industry-led group project in their second year. Our partners benefit from being able to commission focused research projects to help them answer a challenge that they could not normally fund from their core resources. We will deliver at least 15 such projects (3 a year) throughout the lifetime of the CDT.
- Industry-relevant PhD projects. Each student delivers a PhD thesis project in collaboration with at least one external partner who benefits from being able to engage in longer-term and deeper research that they would not normally be able to undertake, especially for those who do not have their own dedicated R&D labs. We will deliver at least 65 such PhDs over the lifetime of this CDT renewal.
- Public engagement. All students receive training in public engagement and learn to communicate their findings through press releases, media coverage.
This proposal introduces two new impact channels in order to further the impact of our students' work and help widen our network of partners.
- The Horizon Impact Fund. Final year students can apply for support to undertake short impact projects. This benefits industry partners, public and third sector partners, academic partners and the wider public benefit from targeted activities that deepen the impact of individual students' PhD work. This will support activities such as developing plans for spin-outs and commercialization; establishing an IP position; preparing and documenting open-source software or datasets; and developing tourable public experiences.
- ORBIT as an impact partner for RRI. Students will embed findings and methods for Responsible Research Innovation into the national training programme that is delivered by ORBIT, the Observatory for Responsible Research and Innovation in ICT (www.orbit-rri.org). Through our direct partnership with ORBIT all Horizon CDT students will be encouraged to write up their experience of RRI as contributions to ORBIT so as to ensure that their PhD research will not only gain visibility but also inform future RRI training and education. PhD projects that are predominantly in the area of RRI are expected to contribute to new training modules, online tools or other ORBIT services.
- Training multi-disciplinary leaders. Our partners will benefit from being able to recruit highly skilled individuals who are able to work across technologies, methods and sectors and in multi-disciplinary teams. We will deliver at least 65 skilled PhD graduates into the Digital Economy.
- Internships. Each Horizon student undertakes at least one industry internship or exchange at an external partner. These internships have a benefit to the student in developing their appreciation of the relevance of their PhD to the external societal and industrial context, and have a benefit to the external partner through engagement with our students and their multidisciplinary skill sets combined with an ability to help innovate new ideas and approaches with minimal long-term risk. Internships are a compulsory part of our programme, taking place in the summer of the first year. We will deliver at least 65 internships with partners.
- Industry-led challenge projects. Each student participates in an industry-led group project in their second year. Our partners benefit from being able to commission focused research projects to help them answer a challenge that they could not normally fund from their core resources. We will deliver at least 15 such projects (3 a year) throughout the lifetime of the CDT.
- Industry-relevant PhD projects. Each student delivers a PhD thesis project in collaboration with at least one external partner who benefits from being able to engage in longer-term and deeper research that they would not normally be able to undertake, especially for those who do not have their own dedicated R&D labs. We will deliver at least 65 such PhDs over the lifetime of this CDT renewal.
- Public engagement. All students receive training in public engagement and learn to communicate their findings through press releases, media coverage.
This proposal introduces two new impact channels in order to further the impact of our students' work and help widen our network of partners.
- The Horizon Impact Fund. Final year students can apply for support to undertake short impact projects. This benefits industry partners, public and third sector partners, academic partners and the wider public benefit from targeted activities that deepen the impact of individual students' PhD work. This will support activities such as developing plans for spin-outs and commercialization; establishing an IP position; preparing and documenting open-source software or datasets; and developing tourable public experiences.
- ORBIT as an impact partner for RRI. Students will embed findings and methods for Responsible Research Innovation into the national training programme that is delivered by ORBIT, the Observatory for Responsible Research and Innovation in ICT (www.orbit-rri.org). Through our direct partnership with ORBIT all Horizon CDT students will be encouraged to write up their experience of RRI as contributions to ORBIT so as to ensure that their PhD research will not only gain visibility but also inform future RRI training and education. PhD projects that are predominantly in the area of RRI are expected to contribute to new training modules, online tools or other ORBIT services.
People |
ORCID iD |
James Goulding (Primary Supervisor) | |
Elizabeth Dolan (Student) |
Publications
Dolan EH
(2022)
Public attitudes towards sharing loyalty card data for academic health research: a qualitative study.
in BMC medical ethics
Dolan E
(2023)
Data donation of individual shopping data to help predict the occurrence of disease: A pilot study linking individual loyalty card and health survey data to investigate COVID-19
in International Journal of Population Data Science
Goulding J
(2023)
Forecasting local COVID-19/Respiratory Disease mortality via national longitudinal shopping data: the case for integrating digital footprint data into early warning systems
in International Journal of Population Data Science
Dolan EH
(2023)
Using Shopping Data to Improve the Diagnosis of Ovarian Cancer: Computational Analysis of a Web-Based Survey.
in JMIR cancer
Dolan E
(2023)
Assessing the value of integrating national longitudinal shopping data into respiratory disease forecasting models
in Nature Communications
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/S023305/1 | 01/10/2019 | 31/03/2028 | |||
2274214 | Studentship | EP/S023305/1 | 01/10/2019 | 30/12/2023 | Elizabeth Dolan |
Description | Avon Longitudinal Study of Parents and Children PhD Partner |
Organisation | University of Bristol |
Department | Avon Longitudinal Study of Parents and Children (ALSPAC) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | I have worked on collaborative research to connect and aid my PhD partner's project "Donating personal transactional data for research: Investigating the public acceptability of using commercial transactional data in public health research". https://www.turing.ac.uk/research/research-projects/donating-personal-transactional-data-research I work with the PI and research team on this project. I have completed an internship with ALSPAC where I conducted interviews and a themed analysis of the interview data collected. One of the outputs from this internship was the paper currently under review for publication "Public attitudes towards sharing loyalty card data for academic health research: a qualitative study". I am currently working with the PI on two research studies investigating collecting individual loyalty card data for health research for ovarian cancer and covid-19. I meet with the research team on a weekly basis to work collaboratively and share expertise and knowledge. |
Collaborator Contribution | Opportunity for an internship with the organisation. Expertise from experienced researchers, shared knowledge from and opportunity to network with researchers working within the field, and collaborative research opportunities. Supervisory support from the PI. |
Impact | Paper currently under review for publication "Public attitudes towards sharing loyalty card data for academic health research: a qualitative study" |
Start Year | 2019 |
Description | Cancer Loyalty Card Study (CLOCS) |
Organisation | Imperial College London |
Department | Department of Surgery and Cancer |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | I am conducting a machine learning analysis of the loyalty card data collected by the CLOCS research team from those with and without ovarian cancer to investigate the use of self-medication and shopping habits prior to a diagnosis of ovarian cancer. |
Collaborator Contribution | Opportunity to join the CLOCS research team and learn from their expertise in cancer genetics and epigenetics, epidemiology and behavioural psychology, and from their ongoing work on CLOCS. Access to data collected as part of the CLOCS study. Opportunity to join Imperial London College as an occasional student. |
Impact | 9 December - Cancer Loyalty Card Study (CLOCS) Annual Meeting 2021, I presented: CLOCS Machine learning analysis plans. |
Start Year | 2021 |
Description | NHSX Internship and Project: Value of Commercial Product Sales Data in Healthcare Prediction |
Organisation | NHS England |
Country | United Kingdom |
Sector | Public |
PI Contribution | Using and sharing expertise and intellectual input for joint research project. Informing and presenting to NHS data analyst staff. Ability, due to unique access, to link commercial datasets to healthcare datasets. |
Collaborator Contribution | Shared healthcare and NHS expertise for research project. Provided access and expertise on healthcare datasets. Provided project support through regular meetings, feedback and practical/technical work support with/from NHS staff. |
Impact | Technical report and open-source code on project: Value of Commercial Product Sales Data in Healthcare Prediction |
Start Year | 2021 |
Description | The Alan Turing Institute Special Interest Group Novel data linkages for health and wellbeing |
Organisation | Alan Turing Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | I have become a co-organiser of The Alan Turing Institute Special Interest Group Novel data linkages for health and wellbeing, though my PhD partner main contact Anya Skatova who is a Turing Fellow. I have helped arrange and facilitate a one day workshop at the Alan Turing Institute in London of stakeholders from academia, industry, healthcare, funders and government bodies, and write-up a post-event report. I am also helping arrange further events including a digital footprint conference on May 11th 2023. |
Collaborator Contribution | Access to expertise and discussions from a range of researchers, industry, government and third sector members on my PhD topic. Access to facilitates, exposure and networks of the Alan Turing Institute. |
Impact | Report on the Inaugural Novel Data Linkages for Health and Wellbeing Special Interest Group event, 26th of October 2022 February 2023 https://digifootprints.co.uk/wp-content/uploads/2023/02/Novel-Data-Linkages-SIG-event-report-FEB2023.pdf |
Start Year | 2022 |
Description | 21 November 2021 - Ovacome Webinar to women with Ovarian Cancer on using shopping data to explore diagnosis and donating shopping data. Ovarian cancer, misdiagnosis and shopping for healthcare products, with Lizzie Dolan |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | 10 women with ovarian cancer attended my webinar on the results of my work and further planned work on using shopping data to investigate the diagnosis of ovarian cancer, the event was also attended by third sector workers. The webinar is now available through the charity Ovacome's YouTube site with currently 73 views. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.youtube.com/watch?v=XWB5kakhyBc |
Description | Blog with NHSX: Model Class Reliance for Demonstrating Variable Importance |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | My research work on using the MCR variable importance tool was explained in a blog by the NHSX Analytics Unit. I helped create the blog which is based on my report for the NHSX on the Value of Commercial Sales Data for Health Predictions. The blog is published on the NHSX AU's GitHub page. |
Year(s) Of Engagement Activity | 2022 |
URL | https://nhsx.github.io/AnalyticsUnit/MCR.html |
Description | February 2023 Turing-Roche Knowledge Share Event: Digital Health. Online webinar |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | One of a series of knowledge share series aiming to bring together members of Roche (a large biotech company, and leading provider of in-vitro diagnostics and innovative solutions across major disease areas https://www.roche.com/about/) and The Alan Turing Institute's networks (the UK's national institute for data science and artificial intelligence https://www.turing.ac.uk) as well as the wider scientific community, to showcase partnership updates and research, knowledge share and hear different academic and industry perspectives on data science topics to gain insight and help build new connections and collaborations (https://www.turing.ac.uk/events/turing-roche-knowledge-share-series). Reached around 150 people with 80-90 viewing live and 75 views on YouTube to date. The event was on the theme of Digital Health, exploring how the increasing amount of collected 'footprint' data can be used to develop healthcare research and products and considerations around this. I gave a talk introducing the Turing Special Interest Group (SIG) Novel Data Linkages for Health and Wellbeing (https://www.turing.ac.uk/research/interest-groups/novel-data-linkages-health-and-wellbeing) and how the groups work applies to my own research in using AI in population health surveillance through digital footprint data. I addressed how three themes emerging from the work SIG is doing bringing multidisciplinary and multi sector communities together to discuss linking novel digital footprint data to health outcomes applied to my own work. These themes were: • The value to policy makers & healthcare organisations • Public acceptability • Industry as data providers I demonstrated how they applied to my studies Assessing the value of integrating national longitudinal shopping data into respiratory disease forecasting models (https://www.researchsquare.com/article/rs-2226531/v1), Public attitudes towards sharing loyalty card data for academic health research: a qualitative study(https://bmcmedethics.biomedcentral.com/articles/10.1186/s12910-022-00795-8) and a third study which I am currently writing up Using shopping data to predict respiratory disease and COVID-19: CIDS (Covid Individual Data Study). My presentation generated a lot of interest and a lot of questions, many of which had to be followed up in the Roche-Turing slack challenge post-event due to time restraints. There was a lot of discussion around the techniques to increase inclusion of wider audiences in medical research, data collection and wrangling, and public acceptance of using these 'novel' data types. |
Year(s) Of Engagement Activity | 2023 |
URL | http://www.turing.ac.uk/events/turing-roche-knowledge-share-series-digital-health |
Description | Magazine article on my research with Ovarian Cancer Charity "Ovacome helps find loyalty card potential" |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | A magazine article in Ovacome's quarterly magazine to update their community on the results of my survey study Ovacome supported me with. The survey study investigated whether shopping purchases were related to the pathway of diagnosis of ovarian cancer. See page 3 in electronic version of magazine. The magazine is also printed and sent to Ovacome members. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.ovacome.org.uk/Handlers/Download.ashx?IDMF=0d3cfb9d-7ba0-4073-bce5-6df56c697052 |
Description | N/LAB website page: Diagnosing Disease with Shopping Data |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Webpage on my PhD which explains my project, and updates the general public with outcomes from the PhD |
Year(s) Of Engagement Activity | 2020,2021,2022 |
URL | https://www.nlab.org.uk/project/shopping-data-disease/ |
Description | Presentation at the UKRI Trustworthy Autonomous Systems (TAS) hub showcase day hosted by the MINDS CDT at Southampton |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Gave a presentation of my work at the UKRI Trustworthy Autonomous Systems (TAS) hub showcase day hosted by the MINDS CDT at Southampton on The Value Of Using Shopping Data To Make Predictions For Deaths From Respiratory Disease: Using Model Class Reliance for AI Explainability. Over 100 post-graduate PhD students whose centres of doctoral training are all linked to the UKRI Trustworthy Autonomous Systems (TAS) hub showcase day, also displayed a poster at the event. |
Year(s) Of Engagement Activity | 2022 |
URL | https://tas.ac.uk |