Responsible AI for Inclusive, Democratic Societies: A cross-disciplinary approach to detecting and countering abusive language online

Lead Research Organisation: University of Sheffield

Department Name: Computer Science

Abstract

Toxic and abusive language threaten the integrity of public dialogue and democracy. Abusive language, such as taunts, slurs, racism, extremism, crudeness, provocation and disguise are generally considered offensive and insulting, has been linked to political polarisation and citizen apathy; the rise of terrorism and radicalisation; and cyberbullying. In response, governments worldwide have enacted strong laws against abusive language that leads to hatred, violence and criminal offences against a particular group. This includes legal obligations to moderate (i.e., detection, evaluation, and potential removal or deletion) online material containing hateful or illegal language in a timely manner; and social media companies have adopted even more stringent regulations in their terms of use. The last few years, however, have seen a significant surge in such abusive online behaviour, leaving governments, social media platforms, and individuals struggling to deal with the consequences.

The responsible (i.e. effective, fair and unbiased) moderation of abusive language carries significant practical, cultural, and legal challenges. While current legislation and public outrage demand a swift response, we do not yet have effective human or technical processes that can address this need. The widespread deployment of human content moderators is costly and inadequate on many levels: the nature of the work is psychologically challenging, and significant efforts lag behind the deluge of data posted every second. At the same time, Artificial Intelligence (AI) solutions implemented to address abusive language have raised concerns about automated processes that affect fundamental human rights, such as freedom of expression, privacy and lack of corporate transparency. Tellingly, the first moves to censor Internet content focused on terms used by the LGBTQ community and AIDS activism. It is no surprise then that content moderation has been dubbed by industry and media as a "billion dollar problem." Thus, this project addresses the overarching question: how can AI be better deployed to foster democracy by integrating freedom of expression, commitments to human rights and multicultural participation in the protection against abuse?

Our project takes on the difficult and urgent issue of detecting and countering abusive language through a novel approach to AI-enhanced moderation that combines computer science with social science and humanities expertise and methods. We focus on two constituencies infamous for toxicity: politicians and gamers. Politicians, because of their public role, are regularly subjected to abusive language. Online gaming and gaming spaces have been identified as private "recruitment sites"' for extreme political views and linked to off-line violent attacks. Specifically, our team will quantify the bias embedded within current content moderation systems that use rigid definitions or determinations of abusive language that may paradoxically create new forms of discrimination or bias based on identity, including sex, gender, ethnicity, culture, religion, political affiliation or other. We will offset these effects by producing more context-aware, dynamic systems of detection. Further, we will empower users by embedding these open source tools within strategies of democratic counter-speech and community-based care and response. Project results will be shared broadly through open access white papers, publications and other online materials with policy, academic, industry, community and public stakeholders. This project will engage and train the next generation of interdisciplinary scholars-crucial to the development of responsible AI.

With its focus on robust AI methods for tackling online abuse in an effective and legally-compliant manner to the vigour of democratic societies, this research has wide-ranging implications and relevance for Canada and the UK.

Planned Impact

Main Beneficiaries:

1) The public: The prevalence of cyber abuse has lead to many government and industry attempts to curb its occurrence through prevention and policy; however, these attempts are hindered by the massive, dynamic volume of online content, as well as impeded by the largely ineffective and time-consuming nature of current abuse moderation methods. The project seeks to address these challenges while also considering issues of content moderation biases that tend to disproportionately tag certain individuals' and communities' language as toxic. These biases affect public dialogue, democratic participation and certain legal rights, such as freedom of expression, equality and privacy rights.

2) Policy makers and NGOs: The results generated by this project will help policymakers (e.g, economic diversification and innovation, justice, privacy, gender and equality) and NGO/community stakeholders (e.g., Amnesty, Reporters without Borders) establish guidelines for addressing online abusive language and inform them of the impacts. It will also provide alternative responsible (effective, unbiased and fair) methods for countering abusive language. Research results will contribute to a more balanced and democratic moderation of political dialogue and engagement while protecting against abuse of politicians and users.

3) Technology companies: Companies such as Intel are seeking to work with academics and NGOs to address abuse-prevention, especially as policies and regulatory frameworks are being developed. Gaming is also an important site for the tech industry, with a >4% yearly growth globally. The community of gamers is growing more diverse (~50% women in Canada in 2018). However, gaming can be a very toxic environment in terms of sexism, racism and other discriminatory forms of abuse, which ultimately limits the size of the gaming market.

4) Law enforcement agencies and social media companies: The responsible NLP methods
arising from this project could be incorporated in existing tools, helping law enforcement agencies and
social media companies detect and counter online abuse in real time.

5) Media companies and stakeholders engagement: Through previous projects, we have already established and will leverage collaborations with Buzzfeed, BBC News, ITV, Reuters Institute for the Study of Journalism and Google; and promote research results through the Centre for Freedom of the Media/UNESCO Journalism Safety Research Network.

6) Early career researchers (ECR)/students: the project will help advance emerging scholars' research trajectories by offering training in interdisciplinary research skills, widening collaborations in the UK, Canada, and the USA, and engaging them in cutting-edge research methods with major social impacts and benefits.

Impact and Outreach Activities:

To achieve maximum impact, project results will be made open-source. Project results will contribute to more responsible AI methods to detect online abusive language. This in turn contributes to increased users' confidence through platforms' greater compliance with relevant policies, human rights and legal frameworks and reinforces key socio-economic and Digital Economy areas, namely online gaming, social platform companies, digital journalism and content moderation technologies and services.

Policy impact will result from knowledge shared in Canada, the UK, and the US (through AI NOW). We will draw on the UK PI's experience who has just submitted written evidence on online abuse of UK MPs to the UK Parliamentary inquiry on Democracy, free speech and freedom of association and harness the Industrial and Parliament Trust. The Canada PI will share new findings with a network of over 35 collaborating scholars and policy/community/industry partners with the Canada 150 Research Chair/SFU Digital Democracy Group.

Funded Value:

£508,135

Funded Period:

Feb 20 - Jan 24

Funder:

FIC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

ES/T012714/1

Principal Investigator:

Kalina Bontcheva

Research Subject:

Info. & commun. Technol. (64%)

Media (16%)

Sociology (16%)

Research Topic:

Artificial Intelligence (64%)

Media & Communication Studies (16%)

Media Studies (16%)

Organisations

People	ORCID iD
Kalina Bontcheva (Principal Investigator)
Nikolaos Aletras (Co-Investigator)	http://orcid.org/0000-0003-4285-1965
Wendy Chun (Co-Investigator)
Ahmed Al-Rawi (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Bakir M (2024) Abuse in the time of COVID-19: the effects of Brexit, gender and partisanship in Online Information Review

Canute M (2023) Dimensions of Online Conflict: Towards Modeling Agonism

Farrell T (2021) MP Twitter Engagement and Abuse Post-first COVID-19 Lockdown in the UK: White Paper

Farrell T (2020) Vindication, Virtue and Vitriol: A study of online engagement and abuse toward British MPs during the COVID-19 Pandemic

Farrell T (2020) Vindication, virtue, and vitriol: A study of online engagement and abuse toward British MPs during the COVID-19 pandemic. in Journal of computational social science

Farrell T. (2021) MP Twitter Engagement and Abuse Post-first COVID-19 Lockdown in the UK

Gorrell G (2020) Which politicians receive abuse? Four factors illuminated in the UK general election 2019 in EPJ Data Science

Jin M. (2020) Complaint Identification in Social Media with Transformer Networks in COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference

Jin M. (2021) Modeling the Severity of Complaints in Social Media in NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Maronikolakis A (2020) Analyzing Political Parody in Social Media

Key Findings
Impact Summary
Policy Influence
Further Funding
Research Databases and Models
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Description	The project has been carrying out research on online abuse towards UK MPs and female journalists, as well developing AI methods for detection of online abuse. Our analysis of Twitter datasets over time has demonstrated that temporal bias is a significant challenge for abusive language detection, with models trained on historical data showing a significant drop in performance over time. It sheds light on the pervasive issue of temporal bias in abusive language detection across languages, offering crucial insights into language evolution and temporal bias mitigation. Our study of the impact of standard label aggregation strategies on minority opinion representation investigated the quality and value of minority annotations, and examined their effect on the class distributions in gold labels, showing that this affects behaviour of models trained on the resulting datasets. The label aggregation strategy should, therefore, be chosen carefully, keeping in mind the objective of the task and use case. Moreover, when choosing a label aggregation strategy, one should be mindful of minority opinions. Where feasible, we suggest performing at least an analysis of how the label distribution changes with label aggregation strategy, and comparing it with the minority label aggregation to ensure that the chosen strategy does not introduce substantial biases against minority voices.
Exploitation Route	The outcomes of the funding have demonstrated that bias in both AI tools and the data on which they are trained is a significant concern, in terms of the ways in which the data is collected and annotated, and in terms of the ways in which models are trained on that data. It shows that there is scope for continued research in these areas, as detailed in our research publications. The datasets we have created can also be used by other researchers for further experimentation and training.
Sectors	Digital/Communication/Information Technologies (including Software) Government Democracy and Justice


Description	Findings from this award have been used to provide advice to DCMS as part of their College of Experts. They have been used to compile a handbook for advice on monitoring online abuse against women journalists (see Outputs section), as well as a training workshop for South East Asian NGOs organised by Dr Maynard in collaboration with Free Press Unlimited and UNESCO, setting up regional CSO monitoring mechanism for violence against journalists in November 2022
First Year Of Impact	2022
Sector	Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice


Description	Guidelines for monitoring online violence against female journalists
Geographic Reach	Multiple continents/international
Policy Influence Type	Influenced training of practitioners or researchers


Description	Publication of a UNESCO report on online violence
Geographic Reach	Multiple continents/international
Policy Influence Type	Contribution to new or improved professional practice


Description	Monitoring online abuse towards female journalists
Amount	£120,000 (GBP)
Organisation	Foreign Commonwealth and Development Office (FCDO)
Sector	Public
Country	United Kingdom
Start	03/2022
End	12/2022


Description	Toolkit for Analysing and Visualising Online Violence Against Female Journalists
Amount	£48,000 (GBP)
Organisation	Higher Education Funding Council for England
Sector	Public
Country	United Kingdom
Start	03/2024
End	03/2025


Title	GATE Hate for politics
Description	A service that tags abusive utterances in any text. It includes a feature, "type", indicating the type of abuse if any, such as sexist, racist etc, and a "target" feature that indicates if the abuse was aimed at the addressee or some other party. This can be run on any English language text. It will also tag UK members of parliament for the 2015, 2017 and 2019 general elections, and candidates for the 2017 and 2019 elections. Where an individual has run for election or been elected multiple times, multiple "Politician" annotations will appear with different "minorType" features. In this way, a person's recent political career can be tracked. The current parliament is the 58th parliament, with previous parliaments counting down, so that MPs with a minorType feature of "mp55" are those that were MPs before the general election in 2015. The service will also tag a range of politically relevant topics, as well as entities such as persons, locations and organizations and Twitter entities such as hashtags and user mentions. It is designed to run on tweets in the original Twitter JSON input format, on which it will also produce metadata such as whether the tweet is a reply or a retweet. Upload your own or harvest some with our Twitter Collector. However it can be run on any text.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	Yes
Impact	Researchers from KCL have been using this service to identify abusive posts on Twitter. Also SFU researchers in Canada from the Digital Democracies Institute.
URL	https://cloud.gate.ac.uk/shopfront/displayItem/gate-hate


Title	Offensive Language Classifier
Description	This classifier is a fine-tuned Roberta-base model using the simpletransformers toolkit. We use the OLIDv1 dataset from OffensEval 2019 as training data. This dataset contains tweets classified as offensive or non-offensive.
Type Of Material	Improvements to research infrastructure
Year Produced	2021
Provided To Others?	Yes
Impact	We have only just made this available to other researchers, so impact information will be provided in the next round of Research Fish reporting.
URL	https://cloud.gate.ac.uk/shopfront/displayItem/offensive-classifier


Title	Toxic Language Classifier
Description	This classifier is a fine-tuned Roberta-base model using the simpletransformers toolkit. We use the Kaggle Toxic Comments Challenge dataset as training data. This dataset contains Wikipedia comments classified as toxic or non-toxic.
Type Of Material	Improvements to research infrastructure
Year Produced	2021
Provided To Others?	Yes
Impact	This tool has just been released to the research community. Usage and impact will be reported in the next Research Fish round.
URL	https://cloud.gate.ac.uk/shopfront/displayItem/toxic-classifier


Title	BA Brexit Geomedia Shared Data
Description	This archive contains shared materials pertaining to the forthcoming paper "Local media and geo-situated responses to Brexit: A quantitative analysis of Twitter, news and survey data" by Genevieve Gorrell, Mehmet E. Bakir, Luke Temple, Diana Maynard, Jackie Harrison, J. Miguel Kanai and Kalina Bontcheva. It contains a folder with a separate document for each of the topic-model-derived topics explored in the paper. The first two columns are topic scores for material from each separate Twitter account in the corpus, along with their Brexit vote intention. After a blank column comes the national newspaper article topic scores. After a further blank column come the local newspaper article scores, along with the NUTS1 region in which they are published. Additionally there is a spreadsheet with entity-based topic scores for each newspaper. Ethics approval was obtained for the Twitter data collection from the University of Sheffield (application number 011934).
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://figshare.shef.ac.uk/articles/BA_Brexit_Geomedia_Shared_Data/12287498


Title	Online Hostility towards UK MPs
Description	This is a dataset with tweets from X. Each tweet mentions one or more UK MPs from a subset selected for our study to give a diverse representation of political leanings. Each tweet is labelled for hostility and the identity characteristic it targets (religion, race, gender). Each annotator also provides a confidence score for each label. Three annotators annotate each tweet. Annotators are UK-based students from Computer Science and Politics.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
Impact	The dataset has just been completed at the end of the project and thus it is too early to report on its impact and take up.


Title	Which Politicians Receive Abuse?
Description	The spreadsheets contain aggregate statistics for abusive language found in tweets to UK politicians in 2019. An overview spreadsheet is provided for each of the months of January to November ("per-mp-xxx-2019.csv" where xxx is the abbreviation for the month), with one row per MP, and a spreadsheet with data per day is provided for the campaign period of the UK 2019 general election, with one row per candidate, starting at the beginning of November and finishing on December 15th, a few days after the election ("campaign-period-per-cand-per-day.csv"). These spreadsheets list, for each individual, gender, party, the start and end times of the counts, tweets authored, retweets by the individual, replies by the individual, the number of times the individual was retweeted, replies received by the individual ("replyTo"), abusive tweets received in total and abusive tweets received in each of the categories sexist, racist and political. Two additional spreadsheets focus on topics; "topics-of-cands.csv" and "topics-of-replies.csv". In the first, counts of tweets mentioning each of a set of topics are given, alongside counts of abusive tweets mentioning each topic, in tweets by each candidate. In the second, the counts are of replies received when a candidate mentions a topic, alongside abusive replies received when they mentioned that topic. The data complement the forthcoming paper "Which Politicians Receive Abuse? Four Factors Illuminated in the UK General Election 2019", by Genevieve Gorrell, Mehmet E Bakir, Ian Roberts, Mark A Greenwood and Kalina Bontcheva. The way the data were acquired is described more fully in the paper. Ethics approval was granted to collect the data through application 25371 at the University of Sheffield.
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	Dataset used by other researchers to replicate the work.
URL	https://figshare.shef.ac.uk/articles/dataset/Which_Politicians_Receive_Abuse_/12340994/1


Title	Which Politicians Receive Abuse?
Description	The spreadsheets contain aggregate statistics for abusive language found in tweets to UK politicians in 2019. An overview spreadsheet is provided for each of the months of January to November ("per-mp-xxx-2019.csv" where xxx is the abbreviation for the month), with one row per MP, and a spreadsheet with data per day is provided for the campaign period of the UK 2019 general election, with one row per candidate, starting at the beginning of November and finishing on December 15th, a few days after the election ("campaign-period-per-cand-per-day.csv"). These spreadsheets list, for each individual, gender, party, the start and end times of the counts, tweets authored, retweets by the individual, replies by the individual, the number of times the individual was retweeted, replies received by the individual ("replyTo"), abusive tweets received in total and abusive tweets received in each of the categories sexist, racist and political. Two additional spreadsheets focus on topics; "topics-of-cands.csv" and "topics-of-replies.csv". In the first, counts of tweets mentioning each of a set of topics are given, alongside counts of abusive tweets mentioning each topic, in tweets by each candidate. In the second, the counts are of replies received when a candidate mentions a topic, alongside abusive replies received when they mentioned that topic. The data complement the forthcoming paper "Which Politicians Receive Abuse? Four Factors Illuminated in the UK General Election 2019", by Genevieve Gorrell, Mehmet E Bakir, Ian Roberts, Mark A Greenwood and Kalina Bontcheva. The way the data were acquired is described more fully in the paper. Ethics approval was granted to collect the data through application 25371 at the University of Sheffield.
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	Dataset used by other researchers to replicate the work.
URL	https://figshare.shef.ac.uk/articles/dataset/Which_Politicians_Receive_Abuse_/12340994


Description	Collaboration with ICFJ
Organisation	International Center for Journalists
Country	United States
Sector	Charity/Non Profit
PI Contribution	Computational analysis of online abuse towards female journalists worldwide
Collaborator Contribution	Qualitative research, journalistic expertise, paper writing, joint discussions and research
Impact	All joint publications already listed - see those co-authored with Julie Posetti
Start Year	2021


Description	Digital Democracies institute, Simon Fraser University
Organisation	Simon Fraser University
Department	Digital Democracies Institute
Country	Canada
Sector	Academic/University
PI Contribution	We have trained the DDI researchers in using NLP tools for analysing online abuse. We have assisted them by applying ML models to some of their data as well as providing some manual annotation for their data. As a result of this, we have collaborated on a paper (see publications).
Collaborator Contribution	They have assisted us with manual annotation of our data and by providing social science expertise to produce joint collaborative research.
Impact	This is a multi-disciplinary collaboration involving social and computer scientists. We have jointly published a paper: Canute M, Jin M, Holtzclaw H, Lusoli A, Adams P, Pandya M, Taboada M... Chun W. (2023). Dimensions of Online Conflict: Towards Modeling Agonism.
Start Year	2020


Title	Shiny app - Which Politicians Receive Abuse During 2019 Election Campaign?
Description	This repository contain source code and processed data for the shiny app - Which Politicians Receive Abuse During 2019 Election Campaign?. The shiny app is built based on the dataset made available by Gorrell, G., Bakir, M., Roberts, I., Greenwood, M., et al. (2020) on Online Research Data. Link for the shiny app.
Type Of Technology	Software
Year Produced	2021
Impact	Demonstration of the research outputs to policy makers, citizens, and other users.
URL	https://figshare.shef.ac.uk/articles/software/Shiny_app_-_Which_Politicians_Receive_Abuse_During_201...


Description	Invited panel member at the International Journalism Conference
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	I was part of an expert panel on online abuse against women journalists alongside internationally acclaimed women journalists who had suffered abuse, researchers from the media and media-related organisations such as the international Centre for Journalists. I talked about our research in hate speech detection and the issues around bias and NLP. The International Journalism Festival is a huge annual event with several thousand participants, mostly from media organisations, and the panel was live streamed to a wider audience additionally. As a result of the panel, I had a number of questions and interest about our research, including follow-up collaboration invitations, and requests to be involved in the ongoing research.
Year(s) Of Engagement Activity	2023
URL	https://www.journalismfestival.com/


Description	Keynote talk at the 2023 BCSWomen Lovelace Colloquium in Sheffield
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Undergraduate students
Results and Impact	I gave a keynote speech about our research in bias in hate speech detection and the platform we've developed for online abuse analysis. I was invited as a senior female academic to give this talk to inspire younger female computer scientists. The talk was held at the University of Sheffield as part of a one-day conference - the Ada Lovelace conference organised by the British Computer Society. As a result of the talk, I had a number of questions and interest about our research, including follow-up talk invitations, and many students were inspired to learn about the kinds of research that could be done, as well as understanding better a number of issues around online abuse that they hadn't previously considered.
Year(s) Of Engagement Activity	2023
URL	https://bcswomenlovelace.bcs.org/?page_id=478#:~:text=The%202023%20BCSWomen%20Lovelace%20Colloquium,...


Description	Keynote talk at the Language Data and Knowledge conference
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	I gave a keynote speech about our research in bias in hate speech detection and the platform we've developed for online abuse analysis. I was invited as a senior female academic to give this talk to inspire younger female computer scientists. The talk was held at the Language, Data and Knowledge International Conference in Vienna, aimed primarily at academics (postgrads, researchers, and professors) as well as industrial researchers. As a result of the talk, I had a number of questions and interest about our research, including follow-up talk invitations, and participants were inspired to learn about the kinds of research that could be done, as well as understanding better a number of issues around online abuse that they hadn't previously considered.
Year(s) Of Engagement Activity	2023
URL	http://2023.ldk-conf.org/invited-speakers/


Description	Roundtable on media freedom
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Policymakers/politicians
Results and Impact	I was invited to be a member of an expert group organised by UNESCO's International Programme for the Development of Communication (IPDC) as part of activities around the global World Press Freedom Day conference held in New York, USA. Along with more than 25 experts from NGOs, academia, media and the tech sector, I brainstormed the issues in a meeting hosted and co-convened with UNESCO and the Danish Mission to the UN.
Year(s) Of Engagement Activity	2023
URL	https://www.unesco.org/en/articles/data-makes-difference-world-press-freedom-day-roundtable


Description	Workshop around our development of an Online Violence Early Warning System as a response to targeted attacks on women journalists.
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This was a workshop with 20 participants on May 4 organised on the side lines of the World Press Freedom Day conference in New York, based arond our pioneering work to develop an Online Violence Early Warning System as a response to targeted attacks on women journalists. The workshop demonstrated and discussed our new prototype interactive tools developed in concert with a series of big data case studies and a set of research-derived indicators for online violence escalation published by the OSCE.
Year(s) Of Engagement Activity	2023

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications