Centre for Cyberhate Research & Policy: Real-Time Scalable Methods & Infrastructure for Modelling the Spread of Cyberhate on Social Media

Lead Research Organisation: Cardiff University
Department Name: Sch of Social Sciences

Abstract

The UK Government's Hate Crime Action Plan (Home Office 2016) stresses the need to tackle hate speech on social media by bringing together policymakers with academics to improve the analysis and understanding of the patterns and drivers of cyberhate and how these can be addressed. Furthermore, the recent Home Affairs Select Committee Inquiry (2016) 'Hate Crime and its Violent Consequences' highlighted the role of social media in the propagation of hate speech (on which the proposers were invited to provide evidence). This proposal acknowledges the migration of hate to social media is non-trivial, and that empirically we know very little about the utility of Web based forms data for measuring online hate speech and counter hate speech at scale and in real-time. This became particularly apparent following the referendum on the UK's future in the European Union, where an inability to classify and monitor hate speech and counter speech on social media in near-real-time and at scale hindered the use of these new forms of data in policy decision making in the area of hate crime. It was months later that small-scale grey literature emerged providing a 'snap-shot' of the problem (Awan & Zempi 2016, Miller et al. 2016). In partnership with the UK Head of the Cross-Government Hate Crime Programme at the Department for Communities and Local Government (DCLG), and the London Mayor's Office for Policing and Crime's (MOPAC) new Online Hate Crime Hub, the proposed project will co-produce evidence on how social media data, harnessed by new Social Data Science methods and scalable infrastructure, can inform policy decision making. We will achieve this by taking the social media reaction to the referendum on the UK's future in the European Union as a demonstration study, and will co-develop with the Policy CI transformational New Forms of Data Capability contributions including: (i) semi-automated methods that monitor the production and spread of cyberhate around the case study and beyond; (ii) complementary methods to study and test the effectiveness of counter speech in reducing the propagation of cyberhate, and (iii) a technical system that can support real time analysis of hate and counter speech on social media at scale following 'trigger events', integrated into existing policy evidence-based decision-making processes. The system, by estimating the propagation of cyberhate interactions within social media using machine learning techniques and statistical models, will assist policymakers in identifying areas that require policy attention and better targeted interventions in the field of online hate and antagonistic content.

Planned Impact

In line with the drive behind the call, this project will co-produce a strong evidence base on the utility of social media data to inform policy development, intervention and decision making. The project will provide a case study that will demonstrate how these data, when effectively and efficiently collected, transformed and repurposed using Social Data Science tools and methods, can have a transformative impact on how governments work to address contemporary pressing social problems. We have selected cyberhate in the aftermath of the referendum on the UK's future in the EU as a case study for understanding the relationship between social media data and policymaking.

We will work closely with the Policy CI, the UK Head of the Cross-Government Hate Crime Programme at the Department for Communities and Local Government, and the London Mayor's Office for Policing and Crime Online Hate Crime Hub, to co-produce an evidence base on the utility of social media data for policy and decision making. We will achieve this by:

--Involving the UK Head of the Cross-Government Hate Crime Programme and the MOPAC Online Hate Crime Hub in the design, testing, analysis and implementation phases of the project, to ensure maximum buy-in at a policy level

--Running requirements gathering workshops with policymakers for tool and system development

--Testing the system developed in WP6 in a policy environment and writing lessons-learned report

--Conducting post-hoc interviews with policymakers to inform an ESRC Policy Evidence Briefing and an Ethics Guide for Policymakers

--Providing free access to new Lab social media hate and counter speech classification tools for not-for-profit use
 
Description In England & Wales police recorded hate crimes are at their highest levels since records began. The migration of hate to the Internet requires the police to address the problem on two fronts. The ESRC funded HateLab (public name for the Cyberhate project) is the first to address the problem both offline and online, generating vital evidence on prevalence, impact and prevention. Lab technologies have been embedded within HMG's NPCC National Online Hate Crime Hub, allowing policymakers and police to prevent hate crime and speech. The HateLab has: v) Innovated by combining social science and computer science research techniques to examine online forms of data to develop an evidence base on online hate speech. Findings revealed that anti-Muslim hate speech spiked in the first 24 hours following terror attacks in 2013 and 2017, and rapidly deescalated, indicating a 'half-life' of 'cyberhate'. In the aftermath of these events social media information flows from police were the second longest lasting within the first 36 hours, indicating that law enforcement online communications might be an effective channel to inform the public, solicit information, and counter rumour, speculation and hate speech; vi) Analysed survey and new forms of data to provide evidence showing hate crime and online hate speech spiked in the final weeks of the Vote Leave and Leave.EU campaigns, following the Brexit vote and at subsequent moments in the Brexit process. These results underpinned the BBC One Panorama documentary 'Hate on the Streets' in 2018; vii) Provided evidence that Brexit related information from Twitter linked to the Russian Internet Research Agency were between 20-40% more likely to be retweeted, compared to UK media, government and public figure/celebrity accounts; viii) Generated evidence that the online abuse of MPs supposedly working against Brexit (so called "mutineers") was organised by a clandestine right-wing group based in London, the results of which appeared on an ITV documentary 'Brexit Online Uncovered' in early 2019. ix) Created an online dashboard that monitors the spread of online hate speech. Using an innovative blend of machine learning (a form of Artificial Intelligence) and social science statistical modelling techniques, the dashboard automatically classifies hateful content in real-time and at a scale hitherto unrealisable. The Online Hate Speech Dashboard was integrated into HMG's National Online Hate Crime Hub (2019). The Hub is the point of contact for all victims of online hate crime, and produces intelligence reports (using the dashboard) for police, senior civil servants and MPs. HateLab results on the spread of online hate speech around events allowed the Hub to better understand the dynamics of propagation, leading to improved response times, better support for victims and more effective allocation of resources. The Director of the Hub states our research has resulted in economic savings of ~£500,000 via the provision of the Dashboard, cloud and data services and an implementation evaluation. HateLab was invited to the Home Affairs Select Committee's inquiry on Hate Crime and its Violent Consequences in November 2016, set-up in response to the murder of Jo Cox MP and the rising levels of hate speech and crimes against the general public and MPs. HateLab evidence was cited in the committee's summary report showing that online hate speech could be detected at scale and in real-time with AI developed at Cardiff. As a result, the inquiry criticised social media companies for not using such technology to counter the spread of hate. HateLab and the Silver Circle law firm, Mischon de Reya, established a partnership in 2018 to publish high-profile reports on the topic of online hate speech, containing legal advice for victims, solicitors and police. The first report was published in early 2019. A co-branded online hate speech 'tracker', available to the public, launched in mid 2019.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice,Security and Diplomacy
Impact Types Societal,Economic,Policy & public services

 
Title Online Hate Speech Dashboard 
Description The Dashboard This tool allows users to access to all open social media feeds (including the Twitter firehose) using a keyword search to identify variation in hate orientated text contained within posts. The dashboard currently allows for the classification of posts containing text that is antagonistic or hateful based on race (anti-black), religion (anti-Muslim), sexual orientation (anti-gay male and female), disability (anti-physical disability), and Jewish identity. Once posts are classified they can be visualised via a suite of tools: a. Real-time and historic modes, allowing end-user to monitor hate speech as it unfolds, and to search back over periods of user data collection for post-hoc analysis b. An interactive hate line chart displaying frequency of tweets, with customisable scale (raw, percentage, log etc). c. An interactive tool for network analysis of hate tweets (where nodes can be selected for further inspection and the production of sub-networks, such as Twitter @mentions, retweets, followers etc.) d. Red/Amber/Green real-time alert system for anomalous spikes in online hate speech above a baseline (defined by user or inferred from average number of hate posts in a given time-frame) e. Tool to identify top N hate hashtags f. Tool to identify top N hate influencers (e.g. top N accounts responsible for N% of hate speech) g. Tool to identify when a top hate user's account is deleted/suspended h. Tool to identify top victim targets (e.g. top N accounts targeted with hate using @mentions) i. Tool to identify Bot accounts with functionality to remove all suspected bots from the analysis and visualization j. Tool to identify links between social media platforms in posts (e.g. frequency of links to far-right open Facebook pages in tweets, far right post on reddit etc.) k. Topic clustering tool, displaying topics detected in posted text and proportion of topics over whole corpus l. Tool to display simple Wordclouds of hate tweets (in addition to topic detection) m. Export tool (sections of dash can be exported) to PDF, image file, bespoke format for end-user n. Demographic estimation of users at an aggregate level (e.g. gender, age) o. Aggregate (e.g. town, city, PFA) geo-location inference plotted on a scalable map (using Lat/Long, user specified location, location name specified in bio etc. - user can specify which are displayed, with all being selectable at once). Individual visualisation tools can be resized and 'toggled' in and out of the view, allowing the user to select the preferred Dashboard set-up for the monitoring task. The suite of tools can also be split over multiple screens to provide the most complete Dashboard set-up. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact The Dashboard will be used by HMG's National Online Hate Crime Hub to collect posts from all open social media feeds at set times around predicted and scheduled landmark events, such as the UK's planned exit from the European Union on 29th March 2019. The Purpose The purpose of the Dashboard along with the results and products it produces, is to assist in the identification of 'anomalous' increases in online hate speech in time and space (where geographical information is available) across multiple open social media sources. Results from the Dashboard will be triangulated with other data and intelligence available to the Hub to determine if any increases in online hate speech may be indicative of a rise in community tensions within offline communities or groups. Where offline community tensions can be verified by multiple data sources (including those beyond the Dashboard) the relevant local authorities will be notified. The Data Collection The Dashboard does not permit the identification of individual offending or offenders. Information produced by the Dashboard can only be used for analytical purposes. The outputs of this analysis will be used to inform policy, strategy and decision making with the overall aim of promoting community cohesion. The Dashboard cannot be used to collect evidence for the purpose of criminal proceedings and its use will not to be disclosed and used as evidence. 
 
Title Online Hate Speech Dashboard 
Description The Dashboard This tool allows users to access to all open social media feeds (including the Twitter firehose) using a keyword search to identify variation in hate orientated text contained within posts. The dashboard currently allows for the classification of posts containing text that is antagonistic or hateful based on race (anti-black), religion (anti-Muslim), sexual orientation (anti-gay male and female), disability (anti-physical disability), and Jewish identity. Once posts are classified they can be visualised via a suite of tools: a. Real-time and historic modes, allowing end-user to monitor hate speech as it unfolds, and to search back over periods of user data collection for post-hoc analysis b. An interactive hate line chart displaying frequency of tweets, with customisable scale (raw, percentage, log etc). c. An interactive tool for network analysis of hate tweets (where nodes can be selected for further inspection and the production of sub-networks, such as Twitter @mentions, retweets, followers etc.) d. Red/Amber/Green real-time alert system for anomalous spikes in online hate speech above a baseline (defined by user or inferred from average number of hate posts in a given time-frame) e. Tool to identify top N hate hashtags f. Tool to identify top N hate influencers (e.g. top N accounts responsible for N% of hate speech) g. Tool to identify when a top hate user's account is deleted/suspended h. Tool to identify top victim targets (e.g. top N accounts targeted with hate using @mentions) i. Tool to identify Bot accounts with functionality to remove all suspected bots from the analysis and visualization j. Tool to identify links between social media platforms in posts (e.g. frequency of links to far-right open Facebook pages in tweets, far right post on reddit etc.) k. Topic clustering tool, displaying topics detected in posted text and proportion of topics over whole corpus l. Tool to display simple Wordclouds of hate tweets (in addition to topic detection) m. Export tool (sections of dash can be exported) to PDF, image file, bespoke format for end-user n. Demographic estimation of users at an aggregate level (e.g. gender, age) o. Aggregate (e.g. town, city, PFA) geo-location inference plotted on a scalable map (using Lat/Long, user specified location, location name specified in bio etc. - user can specify which are displayed, with all being selectable at once). Individual visualisation tools can be resized and 'toggled' in and out of the view, allowing the user to select the preferred Dashboard set-up for the monitoring task. The suite of tools can also be split over multiple screens to provide the most complete Dashboard set-up. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Impact The Dashboard will be used by HMG's National Online Hate Crime Hub to collect posts from all open social media feeds at set times around predicted and scheduled landmark events, such as the UK's planned exit from the European Union on 29th March 2019. The Purpose The purpose of the Dashboard along with the results and products it produces, is to assist in the identification of 'anomalous' increases in online hate speech in time and space (where geographical information is available) across multiple open social media sources. Results from the Dashboard will be triangulated with other data and intelligence available to the Hub to determine if any increases in online hate speech may be indicative of a rise in community tensions within offline communities or groups. Where offline community tensions can be verified by multiple data sources (including those beyond the Dashboard) the relevant local authorities will be notified. The Data Collection The Dashboard does not permit the identification of individual offending or offenders. Information produced by the Dashboard can only be used for analytical purposes. The outputs of this analysis will be used to inform policy, strategy and decision making with the overall aim of promoting community cohesion. The Dashboard cannot be used to collect evidence for the purpose of criminal proceedings and its use will not to be disclosed and used as evidence. 
 
Description BBC One Panorama 'Hate on the Streets' 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact We participated in BBC One's Panorama 'Hate on the Streets'. The project supplied key evidence on the trends in offline hate crimes following the Brexit vote.
Year(s) Of Engagement Activity 2018
URL https://www.youtube.com/watch?v=yetFgoAkrGE
 
Description ITV Exposure 'Brexit Online Uncovered' 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact We provided key evidence to ITV's Exposure documentary 'Brexit Online Uncovered' that showed the links between Twitter users who were abusing MPs online, and how press headlines were statistically associated with increases in general online hate speech associated with Brexit.
Year(s) Of Engagement Activity 2019
URL https://www.itv.com/hub/exposure-brexit-online-uncovered/2a5966a0001
 
Description Paper Presented at The Web Conference 2019, San Francisco, CA, USA 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Liu, H.et al. 2019. Fuzzy multi-task learning for hate speech type identification. Presented at: The Web Conference 2019, San Francisco, CA, USA, 13-17 May 2019Proceedings of the 2019 World Wide Web Conference. ACM, (10.1145/3308558.3313546)
Year(s) Of Engagement Activity 2019
 
Description Paper presented at 1st International Conference on Cyber Deviance Detection (CyberDD) in conjunction with 10th ACM International Conference on Web Search and Data Mining (WSDM 2017), Cambridge, UK 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Burnap, P. and Williams, M. L. 2017. Classifying and modeling cyber hate speech: research and opportunities for practical intervention. Presented at: 1st International Conference on Cyber Deviance Detection (CyberDD) in conjunction with 10th ACM International Conference on Web Search and Data Mining (WSDM 2017), Cambridge, UK, 10 Feb 2017.
Year(s) Of Engagement Activity 2017
 
Description Paper presented at Cambridge Institute of Criminology Seminar Series, University of Cambridge 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Williams, M. L. and Burnap, P. 2017. Social data science & criminology: machine classification and modelling of cyberhate in online social networks. Presented at: Cambridge Institute of Criminology Seminar Series, University of Cambridge, UK, 9 February 2017.
Year(s) Of Engagement Activity 2017
 
Description Paper presented at Data Science and Government Conference, Oxford, UK 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Burnap, P. and Williams, M. L. 2016. Computational human and cyber security analytics for government and policy. Presented at: Data Science and Government Conference, Oxford, UK, 22 June 2016.
Year(s) Of Engagement Activity 2016
 
Description Paper presented at Home Office Crime and Policing Analysis Unit Seminar Series, Westminster, London, UK 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Williams, M. L. and Burnap, P. 2017. Detecting crime events using social media. Presented at: Home Office Crime and Policing Analysis Unit Seminar Series, Westminster, London, UK, July, 2017.
Year(s) Of Engagement Activity 2017
 
Description Paper presented at International Conference on Machine Learning and Cybernetics, Chengdu, China 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Alorainy, W.et al. 2018. Suspended accounts: A source of Tweets with disgust and anger emotions for augmenting hate speech data sample. Presented at: International Conference on Machine Learning and Cybernetics, Chengdu, China, 15-18 July 2018.
Year(s) Of Engagement Activity 2018
 
Description Paper presented at Internet Leadership Academy, Oxford Internet Institute, University of Oxford 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Williams, M. and Burnap, P. 2017. Online extremism and hate speech: definition, measurement & regulation. Presented at: Internet Leadership Academy, Oxford Internet Institute, University of Oxford, UK, 26 September 2017.
Year(s) Of Engagement Activity 2017
 
Description Paper presented at Jensen Lecture Series, Duke University, NC, US 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Williams, M. L. 2016. Crime sensing with big data: the affordances and limitations of using open source communications to estimate crime patterns. Presented at: Jensen Lecture Series, Duke University, NC, US, 2016.
Year(s) Of Engagement Activity 2016
 
Description Paper presented at UK Government Data Science Community Interest Workshop, ONS Data Science Campus, Newport, Wales, UK 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Williams, M. L. and Burnap, P. 2017. Data science solutions for detecting and monitoring Brexit related online hate speech. Presented at: UK Government Data Science Community Interest Workshop, ONS Data Science Campus, Newport, Wales, UK, 4 September 2017.
Year(s) Of Engagement Activity 2017
 
Description Paper presented at: SERENE-RISC Workshop, Université de Montréal, Montreal, QC, Canada 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Williams, M. L. 2017. Big Data and criminology: Research from the UK. Presented at: SERENE-RISC Workshop, Université de Montréal, Montreal, QC, Canada, 26 April 2017.
Year(s) Of Engagement Activity 2017