Social listening: Applying natural language processing methods to social media data to yield actionable analytics for health and care services
Lead Research Organisation:
University of Manchester
Department Name: School of Health Sciences
Abstract
It is vital to understand public opinion and preferences towards the use of information technology and health data as part of designing future models of healthcare and research. Social media platforms (e.g. Twitter), blogs and online discussion forums provide a rich resource of naturally occurring conversations for examining public attitudes and preferences towards (a) information and digital technologies as part of the delivery of healthcare and (b) the secondary use of health data for purposes beyond direct healthcare, such as research.
Analysing public comments and conversations can be analysed manually using established qualitative research techniques. Whilst such methods provide depth and rigour, they are typically labour intensive and can neither be applied rapidly nor on a large scale basis without significant effort and resource. I propose to investigate promising new techniques from the field of natural language processing (NLP) to rapidly and automatically analyse textual data about public attitudes and preferences towards health and care from publicly available social media data. I will compare the performance of NLP methods against established, qualitative approaches and assess how the two approaches can complement each other to gather insights into public opinion for the purposes of ongoing monitoring, research, evaluation and informing public policy.
I will test advanced methods of data visualisation to report my findings. Leveraging my networks, I will explore how to translate my work into wider applications, within Health Data Research UK, healthcare services and internationally. Throughout the project I will adhere to ethical guidelines for using social media data and will involve citizens from relevant communities (online and offline) in shaping the design and delivery of the research.
Analysing public comments and conversations can be analysed manually using established qualitative research techniques. Whilst such methods provide depth and rigour, they are typically labour intensive and can neither be applied rapidly nor on a large scale basis without significant effort and resource. I propose to investigate promising new techniques from the field of natural language processing (NLP) to rapidly and automatically analyse textual data about public attitudes and preferences towards health and care from publicly available social media data. I will compare the performance of NLP methods against established, qualitative approaches and assess how the two approaches can complement each other to gather insights into public opinion for the purposes of ongoing monitoring, research, evaluation and informing public policy.
I will test advanced methods of data visualisation to report my findings. Leveraging my networks, I will explore how to translate my work into wider applications, within Health Data Research UK, healthcare services and internationally. Throughout the project I will adhere to ethical guidelines for using social media data and will involve citizens from relevant communities (online and offline) in shaping the design and delivery of the research.
Technical Summary
Future healthcare delivery and research relies on public acceptance of information technology and health data uses as part of service delivery. Social media platforms (e.g. Twitter), blogs and online discussion forums provide an increasingly ubiquitous, yet under exploited, source of unstructured textual data for examining public attitudes and preferences relevant to health, information technology, data uses and healthcare delivery. Yet, analysing such data manually is labour intensive and can neither be done rapidly nor at scale.
This project will investigate the accuracy of newer natural language processing (NLP) techniques for the rapid, automated extraction of public attitudes and preferences from large-scale, social media data. Exemplar datasets relevant to public attitudes towards the commercial use of health data and wearable technologies will be extracted from social media data, cleaned and prepared for analysis. NLP techniques (e.g. sentiment analysis, text mining, machine learning and/or rule-based methods) and qualitative analysis (e.g. framework analysis, discursive psychology) will then be: (a) applied to unstructured textual data in parallel; (b) benchmarked against each other; and (c) tested within an integrated mixed methods approach.
Online open source decision support tools will be developed to guide the selection and application of NLP techniques (alone and/or as part of a mixed methods approach) to unstructured social media data, addressing distinct purposes, such as longitudinal monitoring of public opinion and informing policy development. Advanced data visualisations will be developed, evaluated and optimised for data exploration, presentation and informing decision making. This will include sentiment analysis, clustering, frequency analysis, and high-dimensional representations of big text data. Findings, tools and methods will be disseminated widely, with the aim of enabling public opinion data to inform future policy development.
This project will investigate the accuracy of newer natural language processing (NLP) techniques for the rapid, automated extraction of public attitudes and preferences from large-scale, social media data. Exemplar datasets relevant to public attitudes towards the commercial use of health data and wearable technologies will be extracted from social media data, cleaned and prepared for analysis. NLP techniques (e.g. sentiment analysis, text mining, machine learning and/or rule-based methods) and qualitative analysis (e.g. framework analysis, discursive psychology) will then be: (a) applied to unstructured textual data in parallel; (b) benchmarked against each other; and (c) tested within an integrated mixed methods approach.
Online open source decision support tools will be developed to guide the selection and application of NLP techniques (alone and/or as part of a mixed methods approach) to unstructured social media data, addressing distinct purposes, such as longitudinal monitoring of public opinion and informing policy development. Advanced data visualisations will be developed, evaluated and optimised for data exploration, presentation and informing decision making. This will include sentiment analysis, clustering, frequency analysis, and high-dimensional representations of big text data. Findings, tools and methods will be disseminated widely, with the aim of enabling public opinion data to inform future policy development.
People |
ORCID iD |
Lamiece Hassan (Principal Investigator / Fellow) |
Publications
Bulcock A
(2021)
Public Perspectives of Using Social Media Data to Improve Adverse Drug Reaction Reporting: A Mixed-Methods Study.
in Drug safety
Elkaref M.
(2021)
A Joint Training Approach to Tweet Classification and Adverse Effect Extraction and Normalization for SMM4H 2021
in Social Media Mining for Health, SMM4H 2021 - Proceedings of the 6th Workshop and Shared Tasks
Ford E
(2020)
Should free-text data in electronic medical records be shared for research? A citizens' jury study in the UK.
in Journal of medical ethics
Ford E
(2020)
Toward an Ethical Framework for the Text Mining of Social Media for Health Research: A Systematic Review.
in Frontiers in digital health
Hassan L
(2021)
A Social Media Campaign (#datasaveslives) to Promote the Benefits of Using Health Data for Research Purposes: Mixed Methods Analysis.
in Journal of medical Internet research
Hassan L
(2022)
Text mining tweets on e-cigarette risks and benefits using machine learning following a vaping related lung injury outbreak in the USA.
in Healthcare analytics (New York, N.Y.)
Hassan L
(2021)
Automated detection and reduction of stigma in online discussions about TB.
in The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease
Jones KH
(2020)
Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper.
in Journal of medical Internet research
Vivekanantham, A.
(2019)
Patient discussions of glucocorticoid-related side effects within an online community health forum
Description | EPSRC Healtex Feasability Funding |
Amount | £29,583 (GBP) |
Funding ID | EP/N027280/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 04/2018 |
End | 10/2018 |
Description | Towards the real-time detection of topics of concern among people living with chronic kidney disease during COVID-19 |
Amount | £10,000 (GBP) |
Organisation | Economic and Social Research Council |
Sector | Public |
Country | United Kingdom |
Start | 06/2020 |
End | 12/2020 |
Description | UKRI Global Challenges Research Fund QR allocation |
Amount | £6,490 (GBP) |
Funding ID | P122809LH |
Organisation | United Kingdom Research and Innovation |
Sector | Public |
Country | United Kingdom |
Start | 05/2020 |
End | 07/2020 |
Description | IBM e-cigarette study |
Organisation | IBM |
Country | United States |
Sector | Private |
PI Contribution | Leading on research conception and design, collecting and analysing social media data about e-cigarettes, analysis of social media data, domain expertise in public health, data ethics and social media research. |
Collaborator Contribution | Contribution towards research design,data analysis, training in the application of natural language processing techniques (up 4 hours per month) and provision of office space (3 days per week for 6 months) . |
Impact | Paper: https://doi.org/10.18653/v1/2021.smm4h-1.16 datasets - >1 million tweets on ecigs in wake of vaping deaths and in lead up to election |
Start Year | 2019 |
Title | TweetClip |
Description | TweetClip is a command line tool that helps researchers to work with tweet data in JSON format. It clips complex tweet data down to the data fields of interest while maintaining relevant data structures, generatign either a JSON or .csv file as the output. |
Type Of Technology | Webtool/Application |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | None yet. |
URL | https://github.com/Republicof1/TweetClip |
Description | KCUK patient workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | Discussion with 4 people living with chronic kidney disease, arranged in collaboration with the charity Kidney Care UK. Discussion focused on how they use social media to interact with charities, peers and other sources of information during the Covid-19 pandemic. Researchers and patients discussed the acceptability of proposals for using text mining methods to help charities like KCUK track and analyse patient concerns in real-time, as they arise. |
Year(s) Of Engagement Activity | 2020 |
Description | Manchester Medical Society Schools Xmas Lecture |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Gave the 2020 Hon Dorothy Wedgwood OBE Annual Christmas Lecture for Young People (delivered via Zoom and live streamed to schools on YouTube this year due to Covid), alongside Prof Niels Peek. Our talk was entitled: 'AI: Hope, Hype or Horror' and included interactive polls, live chatbots and a Q&A. Feedback indicated the talk was well received, increased awareness of the use of AI in medicines (and its promises and pitfalls), and gave useful information on career pathways in health data science. |
Year(s) Of Engagement Activity | 2020 |
URL | https://www.youtube.com/watch?v=0Ma5usIL_l4 |
Description | NICE data analytics event |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Policymakers/politicians |
Results and Impact | Invited talk to an event on data analytics held by the Evidence Synthesis network, a collaboration between the National Institute of Health & Clinical Excellence (NICE) and the University of Manchester, open to both organisations and interested clinicians, students and patients across the North West. I gave a talk on the application of social media analytics and NLP to improve health and care. The talk has led to further discussions and invitations to collaborate with NICE on how they can use social media analytics as part of their work. |
Year(s) Of Engagement Activity | 2019 |
Description | PHG Foundation Roundtable on Citizen Generated Data |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Policymakers/politicians |
Results and Impact | Personal invitation from the PHG Foundation (a non-profit think tank) to participate in a round-table of selected experts and cross-sector representatives with a shared interest in citizen-generated data and its potential for informing individual health and healthcare. Resulted in a PHG policy briefing on citizen generated data: https://www.phgfoundation.org/briefing/an-opportunity-for-public-health |
Year(s) Of Engagement Activity | 2019,2020 |
URL | https://www.phgfoundation.org/briefing/an-opportunity-for-public-health |
Description | Psychology Division Zoom seminar |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Undergraduate students |
Results and Impact | Approx 90 attendees attended an online Zoom seminar discussing my research on using social media data to unlock insights about health, with attention to wider ethical issues. Afterwards the organisers commented that the number of questions and post-talk comments was unusually high, indicating a significant level of interest in the talk. |
Year(s) Of Engagement Activity | 2021 |
URL | https://novseminar.eventbrite.co.uk/ |
Description | Schools engagement |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Schools |
Results and Impact | 40 students aged 16/17 attended a schools event to promote careers in health data science. As part of this I ran a health data science-themed game and talked about my research. Highly positive feedback as a result, indicating students increased their awareness of health data science as a study/career option - see teachers' comments below: "I thought it all worked really well and it made for a very enjoyable day for us as well as the kids. Thanks for all your efforts both developing the materials and also then delivering them so fantastically well on the day." "Lots of the S6 have come to find me this afternoon to tell me how much they enjoyed the day and are also very keen to fill in feedback sheets... Thanks again for offering this great experience, the kids seem to have really benefited and are all raving about it." |
Year(s) Of Engagement Activity | 2019 |
URL | http://intheloop.newsweaver.com/intheloop/qzlc4so882ajn8av8q7r72?email=true&a=1&p=427063&t=18030 |
Description | Spotlight on... blog |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Produced a short blog introducing the topic of my fellowship project, versions of which appeared as a news item on our department and Faculty webpages. It was also widely retweeted on Twitter. These news pieces are widely read by those who subscribe to departmental updates and Twitter followers, including fellow academics, NHS/healthcare practitioners, interested patients and the public. This piece helped to publicise my new award and has led to further speaking invitations and opportunities to supervise students. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.herc.ac.uk/2018/02/27/spotlight-lamiece-hassan/ |
Description | Turing free text public engagement event |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | This was a free workshop aimed at patients and the general public funded by Healtex and the Turing Institute. It was focused on gaining feedback from the public on proposals for a range of safeguards which could be put in place to enable medical data to be shared for research, whilst maintaning privacy and confidentiality. Outputs included: feedback into a national framework for data governance and safeguards for the acceptable use of medical free-text data from patient records within research for public benefit, as reported in a peer reviewed paper accepted for publication in JMIR (Jones et al, In Press); highly positive feedback from attendees about the activities, which informed and engaged patients e.g. this quote from Twitter: "I was lucky enough to attend this session and found the presentations by @LamieceHassanand @DrElizabethFord some of the most entertaining presentations I've watched"; a blog about the event: https://www.hdruk.ac.uk/news/free-text-to-share-or-not-to-share/. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.turing.ac.uk/events/sharing-your-healthcare-data-safely |
Description | Yellow card project public involvement group |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | A group of 5 people who use social media to discuss long term conditions (mainly arthritis) have been participating in a public involvement group to shape the design of research to test the acceptability of using natural language processing techniques to automatically detect adverse drug reactions reported on social media and link them with the MHRA's 'Yellow Card' reporting system. This group has reviewed the interview schedules and helped with recruitment for a qualitative study, involving 6 focus groups. This study is one of the test cases I am using as part of my overall project and is being done in collaboration with several other researchers from UoM. The plan is to write this up to inform the development of a future grant application as a separate study, on which I would be a Co-Investigator. |
Year(s) Of Engagement Activity | 2018,2019 |