Responsible AI for Inclusive, Democratic Societies: A cross-disciplinary approach to detecting and countering abusive language online

Lead Research Organisation: University of Sheffield
Department Name: Computer Science

Abstract

Toxic and abusive language threaten the integrity of public dialogue and democracy. Abusive language, such as taunts, slurs, racism, extremism, crudeness, provocation and disguise are generally considered offensive and insulting, has been linked to political polarisation and citizen apathy; the rise of terrorism and radicalisation; and cyberbullying. In response, governments worldwide have enacted strong laws against abusive language that leads to hatred, violence and criminal offences against a particular group. This includes legal obligations to moderate (i.e., detection, evaluation, and potential removal or deletion) online material containing hateful or illegal language in a timely manner; and social media companies have adopted even more stringent regulations in their terms of use. The last few years, however, have seen a significant surge in such abusive online behaviour, leaving governments, social media platforms, and individuals struggling to deal with the consequences.

The responsible (i.e. effective, fair and unbiased) moderation of abusive language carries significant practical, cultural, and legal challenges. While current legislation and public outrage demand a swift response, we do not yet have effective human or technical processes that can address this need. The widespread deployment of human content moderators is costly and inadequate on many levels: the nature of the work is psychologically challenging, and significant efforts lag behind the deluge of data posted every second. At the same time, Artificial Intelligence (AI) solutions implemented to address abusive language have raised concerns about automated processes that affect fundamental human rights, such as freedom of expression, privacy and lack of corporate transparency. Tellingly, the first moves to censor Internet content focused on terms used by the LGBTQ community and AIDS activism. It is no surprise then that content moderation has been dubbed by industry and media as a "billion dollar problem." Thus, this project addresses the overarching question: how can AI be better deployed to foster democracy by integrating freedom of expression, commitments to human rights and multicultural participation in the protection against abuse?

Our project takes on the difficult and urgent issue of detecting and countering abusive language through a novel approach to AI-enhanced moderation that combines computer science with social science and humanities expertise and methods. We focus on two constituencies infamous for toxicity: politicians and gamers. Politicians, because of their public role, are regularly subjected to abusive language. Online gaming and gaming spaces have been identified as private "recruitment sites"' for extreme political views and linked to off-line violent attacks. Specifically, our team will quantify the bias embedded within current content moderation systems that use rigid definitions or determinations of abusive language that may paradoxically create new forms of discrimination or bias based on identity, including sex, gender, ethnicity, culture, religion, political affiliation or other. We will offset these effects by producing more context-aware, dynamic systems of detection. Further, we will empower users by embedding these open source tools within strategies of democratic counter-speech and community-based care and response. Project results will be shared broadly through open access white papers, publications and other online materials with policy, academic, industry, community and public stakeholders. This project will engage and train the next generation of interdisciplinary scholars-crucial to the development of responsible AI.

With its focus on robust AI methods for tackling online abuse in an effective and legally-compliant manner to the vigour of democratic societies, this research has wide-ranging implications and relevance for Canada and the UK.

Planned Impact

Main Beneficiaries:

1) The public: The prevalence of cyber abuse has lead to many government and industry attempts to curb its occurrence through prevention and policy; however, these attempts are hindered by the massive, dynamic volume of online content, as well as impeded by the largely ineffective and time-consuming nature of current abuse moderation methods. The project seeks to address these challenges while also considering issues of content moderation biases that tend to disproportionately tag certain individuals' and communities' language as toxic. These biases affect public dialogue, democratic participation and certain legal rights, such as freedom of expression, equality and privacy rights.

2) Policy makers and NGOs: The results generated by this project will help policymakers (e.g, economic diversification and innovation, justice, privacy, gender and equality) and NGO/community stakeholders (e.g., Amnesty, Reporters without Borders) establish guidelines for addressing online abusive language and inform them of the impacts. It will also provide alternative responsible (effective, unbiased and fair) methods for countering abusive language. Research results will contribute to a more balanced and democratic moderation of political dialogue and engagement while protecting against abuse of politicians and users.

3) Technology companies: Companies such as Intel are seeking to work with academics and NGOs to address abuse-prevention, especially as policies and regulatory frameworks are being developed. Gaming is also an important site for the tech industry, with a >4% yearly growth globally. The community of gamers is growing more diverse (~50% women in Canada in 2018). However, gaming can be a very toxic environment in terms of sexism, racism and other discriminatory forms of abuse, which ultimately limits the size of the gaming market.

4) Law enforcement agencies and social media companies: The responsible NLP methods
arising from this project could be incorporated in existing tools, helping law enforcement agencies and
social media companies detect and counter online abuse in real time.

5) Media companies and stakeholders engagement: Through previous projects, we have already established and will leverage collaborations with Buzzfeed, BBC News, ITV, Reuters Institute for the Study of Journalism and Google; and promote research results through the Centre for Freedom of the Media/UNESCO Journalism Safety Research Network.

6) Early career researchers (ECR)/students: the project will help advance emerging scholars' research trajectories by offering training in interdisciplinary research skills, widening collaborations in the UK, Canada, and the USA, and engaging them in cutting-edge research methods with major social impacts and benefits.

Impact and Outreach Activities:

To achieve maximum impact, project results will be made open-source. Project results will contribute to more responsible AI methods to detect online abusive language. This in turn contributes to increased users' confidence through platforms' greater compliance with relevant policies, human rights and legal frameworks and reinforces key socio-economic and Digital Economy areas, namely online gaming, social platform companies, digital journalism and content moderation technologies and services.

Policy impact will result from knowledge shared in Canada, the UK, and the US (through AI NOW). We will draw on the UK PI's experience who has just submitted written evidence on online abuse of UK MPs to the UK Parliamentary inquiry on Democracy, free speech and freedom of association and harness the Industrial and Parliament Trust. The Canada PI will share new findings with a network of over 35 collaborating scholars and policy/community/industry partners with the Canada 150 Research Chair/SFU Digital Democracy Group.

Publications

10 25 50

publication icon
Jin M. (2021) Modeling the Severity of Complaints in Social Media in NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

publication icon
Jin M. (2020) Complaint Identification in Social Media with Transformer Networks in COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference

 
Description The project has been carrying out research on online abuse towards UK MPs and female journalists, as well developing AI methods for detection of online abuse.

Our analysis of Twitter datasets over time has demonstrated that temporal bias is a significant challenge for abusive language detection, with models trained on historical data showing a significant drop in performance over time. It sheds light on the pervasive issue of temporal bias in abusive language detection across languages, offering crucial insights into language evolution and temporal bias mitigation.
Our study of the impact of standard label aggregation strategies on minority opinion representation investigated the quality and value of minority annotations, and examined their effect on the class distributions in gold labels, showing that this affects behaviour of models trained on the resulting datasets. The label aggregation strategy should, therefore, be chosen carefully, keeping in mind the objective of the task and use case. Moreover, when choosing a label aggregation strategy, one should be mindful of minority opinions. Where feasible, we suggest performing at least an analysis of how the label distribution changes with label aggregation strategy, and comparing it with the minority label aggregation to ensure that the chosen strategy does not introduce substantial biases against minority voices.
Exploitation Route The outcomes of the funding have demonstrated that bias in both AI tools and the data on which they are trained is a significant concern, in terms of the ways in which the data is collected and annotated, and in terms of the ways in which models are trained on that data. It shows that there is scope for continued research in these areas, as detailed in our research publications. The datasets we have created can also be used by other researchers for further experimentation and training.
Sectors Digital/Communication/Information Technologies (including Software)

Government

Democracy and Justice

 
Description Findings from this award have been used to provide advice to DCMS as part of their College of Experts. They have been used to compile a handbook for advice on monitoring online abuse against women journalists (see Outputs section), as well as a training workshop for South East Asian NGOs organised by Dr Maynard in collaboration with Free Press Unlimited and UNESCO, setting up regional CSO monitoring mechanism for violence against journalists in November 2022
First Year Of Impact 2022
Sector Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice
 
Description Guidelines for monitoring online violence against female journalists
Geographic Reach Multiple continents/international 
Policy Influence Type Influenced training of practitioners or researchers
 
Description Publication of a UNESCO report on online violence
Geographic Reach Multiple continents/international 
Policy Influence Type Contribution to new or improved professional practice
 
Description Monitoring online abuse towards female journalists
Amount £120,000 (GBP)
Organisation Foreign Commonwealth and Development Office (FCDO) 
Sector Public
Country United Kingdom
Start 03/2022 
End 12/2022
 
Description Toolkit for Analysing and Visualising Online Violence Against Female Journalists
Amount £48,000 (GBP)
Organisation Higher Education Funding Council for England 
Sector Public
Country United Kingdom
Start 03/2024 
End 03/2025
 
Title GATE Hate for politics 
Description A service that tags abusive utterances in any text. It includes a feature, "type", indicating the type of abuse if any, such as sexist, racist etc, and a "target" feature that indicates if the abuse was aimed at the addressee or some other party. This can be run on any English language text. It will also tag UK members of parliament for the 2015, 2017 and 2019 general elections, and candidates for the 2017 and 2019 elections. Where an individual has run for election or been elected multiple times, multiple "Politician" annotations will appear with different "minorType" features. In this way, a person's recent political career can be tracked. The current parliament is the 58th parliament, with previous parliaments counting down, so that MPs with a minorType feature of "mp55" are those that were MPs before the general election in 2015. The service will also tag a range of politically relevant topics, as well as entities such as persons, locations and organizations and Twitter entities such as hashtags and user mentions. It is designed to run on tweets in the original Twitter JSON input format, on which it will also produce metadata such as whether the tweet is a reply or a retweet. Upload your own or harvest some with our Twitter Collector. However it can be run on any text. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Researchers from KCL have been using this service to identify abusive posts on Twitter. Also SFU researchers in Canada from the Digital Democracies Institute. 
URL https://cloud.gate.ac.uk/shopfront/displayItem/gate-hate
 
Title Offensive Language Classifier 
Description This classifier is a fine-tuned Roberta-base model using the simpletransformers toolkit. We use the OLIDv1 dataset from OffensEval 2019 as training data. This dataset contains tweets classified as offensive or non-offensive. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact We have only just made this available to other researchers, so impact information will be provided in the next round of Research Fish reporting. 
URL https://cloud.gate.ac.uk/shopfront/displayItem/offensive-classifier
 
Title Toxic Language Classifier 
Description This classifier is a fine-tuned Roberta-base model using the simpletransformers toolkit. We use the Kaggle Toxic Comments Challenge dataset as training data. This dataset contains Wikipedia comments classified as toxic or non-toxic. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact This tool has just been released to the research community. Usage and impact will be reported in the next Research Fish round. 
URL https://cloud.gate.ac.uk/shopfront/displayItem/toxic-classifier
 
Title BA Brexit Geomedia Shared Data 
Description This archive contains shared materials pertaining to the forthcoming paper "Local media and geo-situated responses to Brexit: A quantitative analysis of Twitter, news and survey data" by Genevieve Gorrell, Mehmet E. Bakir, Luke Temple, Diana Maynard, Jackie Harrison, J. Miguel Kanai and Kalina Bontcheva. It contains a folder with a separate document for each of the topic-model-derived topics explored in the paper. The first two columns are topic scores for material from each separate Twitter account in the corpus, along with their Brexit vote intention. After a blank column comes the national newspaper article topic scores. After a further blank column come the local newspaper article scores, along with the NUTS1 region in which they are published. Additionally there is a spreadsheet with entity-based topic scores for each newspaper. Ethics approval was obtained for the Twitter data collection from the University of Sheffield (application number 011934). 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://figshare.shef.ac.uk/articles/BA_Brexit_Geomedia_Shared_Data/12287498
 
Title Online Hostility towards UK MPs 
Description This is a dataset with tweets from X. Each tweet mentions one or more UK MPs from a subset selected for our study to give a diverse representation of political leanings. Each tweet is labelled for hostility and the identity characteristic it targets (religion, race, gender). Each annotator also provides a confidence score for each label. Three annotators annotate each tweet. Annotators are UK-based students from Computer Science and Politics. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
Impact The dataset has just been completed at the end of the project and thus it is too early to report on its impact and take up. 
 
Title Which Politicians Receive Abuse? 
Description The spreadsheets contain aggregate statistics for abusive language found in tweets to UK politicians in 2019. An overview spreadsheet is provided for each of the months of January to November ("per-mp-xxx-2019.csv" where xxx is the abbreviation for the month), with one row per MP, and a spreadsheet with data per day is provided for the campaign period of the UK 2019 general election, with one row per candidate, starting at the beginning of November and finishing on December 15th, a few days after the election ("campaign-period-per-cand-per-day.csv"). These spreadsheets list, for each individual, gender, party, the start and end times of the counts, tweets authored, retweets *by* the individual, replies by the individual, the number of times the individual was retweeted, replies received by the individual ("replyTo"), abusive tweets received in total and abusive tweets received in each of the categories sexist, racist and political. Two additional spreadsheets focus on topics; "topics-of-cands.csv" and "topics-of-replies.csv". In the first, counts of tweets mentioning each of a set of topics are given, alongside counts of abusive tweets mentioning each topic, in tweets *by* each candidate. In the second, the counts are of replies received when a candidate mentions a topic, alongside abusive replies received when they mentioned that topic. The data complement the forthcoming paper "Which Politicians Receive Abuse? Four Factors Illuminated in the UK General Election 2019", by Genevieve Gorrell, Mehmet E Bakir, Ian Roberts, Mark A Greenwood and Kalina Bontcheva. The way the data were acquired is described more fully in the paper. Ethics approval was granted to collect the data through application 25371 at the University of Sheffield. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact Dataset used by other researchers to replicate the work. 
URL https://figshare.shef.ac.uk/articles/dataset/Which_Politicians_Receive_Abuse_/12340994/1
 
Title Which Politicians Receive Abuse? 
Description The spreadsheets contain aggregate statistics for abusive language found in tweets to UK politicians in 2019. An overview spreadsheet is provided for each of the months of January to November ("per-mp-xxx-2019.csv" where xxx is the abbreviation for the month), with one row per MP, and a spreadsheet with data per day is provided for the campaign period of the UK 2019 general election, with one row per candidate, starting at the beginning of November and finishing on December 15th, a few days after the election ("campaign-period-per-cand-per-day.csv"). These spreadsheets list, for each individual, gender, party, the start and end times of the counts, tweets authored, retweets *by* the individual, replies by the individual, the number of times the individual was retweeted, replies received by the individual ("replyTo"), abusive tweets received in total and abusive tweets received in each of the categories sexist, racist and political. Two additional spreadsheets focus on topics; "topics-of-cands.csv" and "topics-of-replies.csv". In the first, counts of tweets mentioning each of a set of topics are given, alongside counts of abusive tweets mentioning each topic, in tweets *by* each candidate. In the second, the counts are of replies received when a candidate mentions a topic, alongside abusive replies received when they mentioned that topic. The data complement the forthcoming paper "Which Politicians Receive Abuse? Four Factors Illuminated in the UK General Election 2019", by Genevieve Gorrell, Mehmet E Bakir, Ian Roberts, Mark A Greenwood and Kalina Bontcheva. The way the data were acquired is described more fully in the paper. Ethics approval was granted to collect the data through application 25371 at the University of Sheffield. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact Dataset used by other researchers to replicate the work. 
URL https://figshare.shef.ac.uk/articles/dataset/Which_Politicians_Receive_Abuse_/12340994
 
Description Collaboration with ICFJ 
Organisation International Center for Journalists
Country United States 
Sector Charity/Non Profit 
PI Contribution Computational analysis of online abuse towards female journalists worldwide
Collaborator Contribution Qualitative research, journalistic expertise, paper writing, joint discussions and research
Impact All joint publications already listed - see those co-authored with Julie Posetti
Start Year 2021
 
Description Digital Democracies institute, Simon Fraser University 
Organisation Simon Fraser University
Department Digital Democracies Institute
Country Canada 
Sector Academic/University 
PI Contribution We have trained the DDI researchers in using NLP tools for analysing online abuse. We have assisted them by applying ML models to some of their data as well as providing some manual annotation for their data. As a result of this, we have collaborated on a paper (see publications).
Collaborator Contribution They have assisted us with manual annotation of our data and by providing social science expertise to produce joint collaborative research.
Impact This is a multi-disciplinary collaboration involving social and computer scientists. We have jointly published a paper: Canute M, Jin M, Holtzclaw H, Lusoli A, Adams P, Pandya M, Taboada M... Chun W. (2023). Dimensions of Online Conflict: Towards Modeling Agonism.
Start Year 2020
 
Title Shiny app - Which Politicians Receive Abuse During 2019 Election Campaign? 
Description This repository contain source code and processed data for the shiny app - Which Politicians Receive Abuse During 2019 Election Campaign?. The shiny app is built based on the dataset made available by Gorrell, G., Bakir, M., Roberts, I., Greenwood, M., et al. (2020) on Online Research Data. Link for the shiny app. 
Type Of Technology Software 
Year Produced 2021 
Impact Demonstration of the research outputs to policy makers, citizens, and other users. 
URL https://figshare.shef.ac.uk/articles/software/Shiny_app_-_Which_Politicians_Receive_Abuse_During_201...
 
Description Invited panel member at the International Journalism Conference 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I was part of an expert panel on online abuse against women journalists alongside internationally acclaimed women journalists who had suffered abuse, researchers from the media and media-related organisations such as the international Centre for Journalists. I talked about our research in hate speech detection and the issues around bias and NLP. The International Journalism Festival is a huge annual event with several thousand participants, mostly from media organisations, and the panel was live streamed to a wider audience additionally. As a result of the panel, I had a number of questions and interest about our research, including follow-up collaboration invitations, and requests to be involved in the ongoing research.
Year(s) Of Engagement Activity 2023
URL https://www.journalismfestival.com/
 
Description Keynote talk at the 2023 BCSWomen Lovelace Colloquium in Sheffield 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact I gave a keynote speech about our research in bias in hate speech detection and the platform we've developed for online abuse analysis. I was invited as a senior female academic to give this talk to inspire younger female computer scientists. The talk was held at the University of Sheffield as part of a one-day conference - the Ada Lovelace conference organised by the British Computer Society. As a result of the talk, I had a number of questions and interest about our research, including follow-up talk invitations, and many students were inspired to learn about the kinds of research that could be done, as well as understanding better a number of issues around online abuse that they hadn't previously considered.
Year(s) Of Engagement Activity 2023
URL https://bcswomenlovelace.bcs.org/?page_id=478#:~:text=The%202023%20BCSWomen%20Lovelace%20Colloquium,...
 
Description Keynote talk at the Language Data and Knowledge conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I gave a keynote speech about our research in bias in hate speech detection and the platform we've developed for online abuse analysis. I was invited as a senior female academic to give this talk to inspire younger female computer scientists. The talk was held at the Language, Data and Knowledge International Conference in Vienna, aimed primarily at academics (postgrads, researchers, and professors) as well as industrial researchers. As a result of the talk, I had a number of questions and interest about our research, including follow-up talk invitations, and participants were inspired to learn about the kinds of research that could be done, as well as understanding better a number of issues around online abuse that they hadn't previously considered.
Year(s) Of Engagement Activity 2023
URL http://2023.ldk-conf.org/invited-speakers/
 
Description Roundtable on media freedom 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact I was invited to be a member of an expert group organised by UNESCO's International Programme for the Development of Communication (IPDC) as part of activities around the global World Press Freedom Day conference held in New York, USA. Along with more than 25 experts from NGOs, academia, media and the tech sector, I brainstormed the issues in a meeting hosted and co-convened with UNESCO and the Danish Mission to the UN.
Year(s) Of Engagement Activity 2023
URL https://www.unesco.org/en/articles/data-makes-difference-world-press-freedom-day-roundtable
 
Description Workshop around our development of an Online Violence Early Warning System as a response to targeted attacks on women journalists. 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This was a workshop with 20 participants on May 4 organised on the side lines of the World Press Freedom Day conference in New York, based arond our pioneering work to develop an Online Violence Early Warning System as a response to targeted attacks on women journalists. The workshop demonstrated and discussed our new prototype interactive tools developed in concert with a series of big data case studies and a set of research-derived indicators for online violence escalation published by the OSCE.
Year(s) Of Engagement Activity 2023