Social listening: Applying natural language processing methods to social media data to yield actionable analytics for health and care services

Lead Research Organisation: University of Manchester
Department Name: School of Health Sciences

Abstract

It is vital to understand public opinion and preferences towards the use of information technology and health data as part of designing future models of healthcare and research. Social media platforms (e.g. Twitter), blogs and online discussion forums provide a rich resource of naturally occurring conversations for examining public attitudes and preferences towards (a) information and digital technologies as part of the delivery of healthcare and (b) the secondary use of health data for purposes beyond direct healthcare, such as research.

Analysing public comments and conversations can be analysed manually using established qualitative research techniques. Whilst such methods provide depth and rigour, they are typically labour intensive and can neither be applied rapidly nor on a large scale basis without significant effort and resource. I propose to investigate promising new techniques from the field of natural language processing (NLP) to rapidly and automatically analyse textual data about public attitudes and preferences towards health and care from publicly available social media data. I will compare the performance of NLP methods against established, qualitative approaches and assess how the two approaches can complement each other to gather insights into public opinion for the purposes of ongoing monitoring, research, evaluation and informing public policy.

I will test advanced methods of data visualisation to report my findings. Leveraging my networks, I will explore how to translate my work into wider applications, within Health Data Research UK, healthcare services and internationally. Throughout the project I will adhere to ethical guidelines for using social media data and will involve citizens from relevant communities (online and offline) in shaping the design and delivery of the research.

Technical Summary

Future healthcare delivery and research relies on public acceptance of information technology and health data uses as part of service delivery. Social media platforms (e.g. Twitter), blogs and online discussion forums provide an increasingly ubiquitous, yet under exploited, source of unstructured textual data for examining public attitudes and preferences relevant to health, information technology, data uses and healthcare delivery. Yet, analysing such data manually is labour intensive and can neither be done rapidly nor at scale.
This project will investigate the accuracy of newer natural language processing (NLP) techniques for the rapid, automated extraction of public attitudes and preferences from large-scale, social media data. Exemplar datasets relevant to public attitudes towards the commercial use of health data and wearable technologies will be extracted from social media data, cleaned and prepared for analysis. NLP techniques (e.g. sentiment analysis, text mining, machine learning and/or rule-based methods) and qualitative analysis (e.g. framework analysis, discursive psychology) will then be: (a) applied to unstructured textual data in parallel; (b) benchmarked against each other; and (c) tested within an integrated mixed methods approach.
Online open source decision support tools will be developed to guide the selection and application of NLP techniques (alone and/or as part of a mixed methods approach) to unstructured social media data, addressing distinct purposes, such as longitudinal monitoring of public opinion and informing policy development. Advanced data visualisations will be developed, evaluated and optimised for data exploration, presentation and informing decision making. This will include sentiment analysis, clustering, frequency analysis, and high-dimensional representations of big text data. Findings, tools and methods will be disseminated widely, with the aim of enabling public opinion data to inform future policy development.

Publications

10 25 50

publication icon
Hassan L (2021) Automated detection and reduction of stigma in online discussions about TB. in The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease

 
Description EPSRC Healtex Feasability Funding
Amount £29,583 (GBP)
Funding ID EP/N027280/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 05/2018 
End 10/2018
 
Description Towards the real-time detection of topics of concern among people living with chronic kidney disease during COVID-19
Amount £10,000 (GBP)
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 07/2020 
End 12/2020
 
Description UKRI Global Challenges Research Fund QR allocation
Amount £6,490 (GBP)
Funding ID P122809LH 
Organisation United Kingdom Research and Innovation 
Sector Public
Country United Kingdom
Start 06/2020 
End 07/2020
 
Description IBM e-cigarette study 
Organisation IBM
Country United States 
Sector Private 
PI Contribution Leading on research conception and design, collecting and analysing social media data about e-cigarettes, analysis of social media data, domain expertise in public health, data ethics and social media research.
Collaborator Contribution Contribution towards research design,data analysis, training in the application of natural language processing techniques (up 4 hours per month) and provision of office space (3 days per week for 6 months) .
Impact Paper: https://doi.org/10.18653/v1/2021.smm4h-1.16 datasets - >1 million tweets on ecigs in wake of vaping deaths and in lead up to election
Start Year 2019
 
Title TweetClip 
Description TweetClip is a command line tool that helps researchers to work with tweet data in JSON format. It clips complex tweet data down to the data fields of interest while maintaining relevant data structures, generatign either a JSON or .csv file as the output. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact None yet. 
URL https://github.com/Republicof1/TweetClip
 
Description KCUK patient workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Patients, carers and/or patient groups
Results and Impact Discussion with 4 people living with chronic kidney disease, arranged in collaboration with the charity Kidney Care UK. Discussion focused on how they use social media to interact with charities, peers and other sources of information during the Covid-19 pandemic. Researchers and patients discussed the acceptability of proposals for using text mining methods to help charities like KCUK track and analyse patient concerns in real-time, as they arise.
Year(s) Of Engagement Activity 2020
 
Description Manchester Medical Society Schools Xmas Lecture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Gave the 2020 Hon Dorothy Wedgwood OBE Annual Christmas Lecture for Young People (delivered via Zoom and live streamed to schools on YouTube this year due to Covid), alongside Prof Niels Peek. Our talk was entitled: 'AI: Hope, Hype or Horror' and included interactive polls, live chatbots and a Q&A. Feedback indicated the talk was well received, increased awareness of the use of AI in medicines (and its promises and pitfalls), and gave useful information on career pathways in health data science.
Year(s) Of Engagement Activity 2020
URL https://www.youtube.com/watch?v=0Ma5usIL_l4
 
Description NICE data analytics event 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Policymakers/politicians
Results and Impact Invited talk to an event on data analytics held by the Evidence Synthesis network, a collaboration between the National Institute of Health & Clinical Excellence (NICE) and the University of Manchester, open to both organisations and interested clinicians, students and patients across the North West.
I gave a talk on the application of social media analytics and NLP to improve health and care. The talk has led to further discussions and invitations to collaborate with NICE on how they can use social media analytics as part of their work.
Year(s) Of Engagement Activity 2019
 
Description PHG Foundation Roundtable on Citizen Generated Data 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Personal invitation from the PHG Foundation (a non-profit think tank) to participate in a round-table of selected experts and cross-sector representatives with a shared interest in citizen-generated data and its potential for informing individual health and healthcare. Resulted in a PHG policy briefing on citizen generated data: https://www.phgfoundation.org/briefing/an-opportunity-for-public-health
Year(s) Of Engagement Activity 2019,2020
URL https://www.phgfoundation.org/briefing/an-opportunity-for-public-health
 
Description Psychology Division Zoom seminar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact Approx 90 attendees attended an online Zoom seminar discussing my research on using social media data to unlock insights about health, with attention to wider ethical issues. Afterwards the organisers commented that the number of questions and post-talk comments was unusually high, indicating a significant level of interest in the talk.
Year(s) Of Engagement Activity 2021
URL https://novseminar.eventbrite.co.uk/
 
Description Schools engagement 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact 40 students aged 16/17 attended a schools event to promote careers in health data science. As part of this I ran a health data science-themed game and talked about my research. Highly positive feedback as a result, indicating students increased their awareness of health data science as a study/career option - see teachers' comments below:

"I thought it all worked really well and it made for a very enjoyable day for us as well as the kids. Thanks for all your efforts both developing the materials and also then delivering them so fantastically well on the day."

"Lots of the S6 have come to find me this afternoon to tell me how much they enjoyed the day and are also very keen to fill in feedback sheets... Thanks again for offering this great experience, the kids seem to have really benefited and are all raving about it."
Year(s) Of Engagement Activity 2019
URL http://intheloop.newsweaver.com/intheloop/qzlc4so882ajn8av8q7r72?email=true&a=1&p=427063&t=18030
 
Description Spotlight on... blog 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Produced a short blog introducing the topic of my fellowship project, versions of which appeared as a news item on our department and Faculty webpages. It was also widely retweeted on Twitter. These news pieces are widely read by those who subscribe to departmental updates and Twitter followers, including fellow academics, NHS/healthcare practitioners, interested patients and the public. This piece helped to publicise my new award and has led to further speaking invitations and opportunities to supervise students.
Year(s) Of Engagement Activity 2018
URL https://www.herc.ac.uk/2018/02/27/spotlight-lamiece-hassan/
 
Description Turing free text public engagement event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact This was a free workshop aimed at patients and the general public funded by Healtex and the Turing Institute. It was focused on gaining feedback from the public on proposals for a range of safeguards which could be put in place to enable medical data to be shared for research, whilst maintaning privacy and confidentiality.

Outputs included: feedback into a national framework for data governance and safeguards for the acceptable use of medical free-text data from patient records within research for public benefit, as reported in a peer reviewed paper accepted for publication in JMIR (Jones et al, In Press); highly positive feedback from attendees about the activities, which informed and engaged patients e.g. this quote from Twitter: "I was lucky enough to attend this session and found the presentations by @LamieceHassanand @DrElizabethFord some of the most entertaining presentations I've watched"; a blog about the event: https://www.hdruk.ac.uk/news/free-text-to-share-or-not-to-share/.
Year(s) Of Engagement Activity 2019
URL https://www.turing.ac.uk/events/sharing-your-healthcare-data-safely
 
Description Yellow card project public involvement group 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Patients, carers and/or patient groups
Results and Impact A group of 5 people who use social media to discuss long term conditions (mainly arthritis) have been participating in a public involvement group to shape the design of research to test the acceptability of using natural language processing techniques to automatically detect adverse drug reactions reported on social media and link them with the MHRA's 'Yellow Card' reporting system. This group has reviewed the interview schedules and helped with recruitment for a qualitative study, involving 6 focus groups. This study is one of the test cases I am using as part of my overall project and is being done in collaboration with several other researchers from UoM. The plan is to write this up to inform the development of a future grant application as a separate study, on which I would be a Co-Investigator.
Year(s) Of Engagement Activity 2018,2019