Responsible AI for Inclusive, Democratic Societies: A cross-disciplinary approach to detecting and countering abusive language online

Lead Research Organisation: University of Sheffield
Department Name: Computer Science

Abstract

Toxic and abusive language threaten the integrity of public dialogue and democracy. Abusive language, such as taunts, slurs, racism, extremism, crudeness, provocation and disguise are generally considered offensive and insulting, has been linked to political polarisation and citizen apathy; the rise of terrorism and radicalisation; and cyberbullying. In response, governments worldwide have enacted strong laws against abusive language that leads to hatred, violence and criminal offences against a particular group. This includes legal obligations to moderate (i.e., detection, evaluation, and potential removal or deletion) online material containing hateful or illegal language in a timely manner; and social media companies have adopted even more stringent regulations in their terms of use. The last few years, however, have seen a significant surge in such abusive online behaviour, leaving governments, social media platforms, and individuals struggling to deal with the consequences.

The responsible (i.e. effective, fair and unbiased) moderation of abusive language carries significant practical, cultural, and legal challenges. While current legislation and public outrage demand a swift response, we do not yet have effective human or technical processes that can address this need. The widespread deployment of human content moderators is costly and inadequate on many levels: the nature of the work is psychologically challenging, and significant efforts lag behind the deluge of data posted every second. At the same time, Artificial Intelligence (AI) solutions implemented to address abusive language have raised concerns about automated processes that affect fundamental human rights, such as freedom of expression, privacy and lack of corporate transparency. Tellingly, the first moves to censor Internet content focused on terms used by the LGBTQ community and AIDS activism. It is no surprise then that content moderation has been dubbed by industry and media as a "billion dollar problem." Thus, this project addresses the overarching question: how can AI be better deployed to foster democracy by integrating freedom of expression, commitments to human rights and multicultural participation in the protection against abuse?

Our project takes on the difficult and urgent issue of detecting and countering abusive language through a novel approach to AI-enhanced moderation that combines computer science with social science and humanities expertise and methods. We focus on two constituencies infamous for toxicity: politicians and gamers. Politicians, because of their public role, are regularly subjected to abusive language. Online gaming and gaming spaces have been identified as private "recruitment sites"' for extreme political views and linked to off-line violent attacks. Specifically, our team will quantify the bias embedded within current content moderation systems that use rigid definitions or determinations of abusive language that may paradoxically create new forms of discrimination or bias based on identity, including sex, gender, ethnicity, culture, religion, political affiliation or other. We will offset these effects by producing more context-aware, dynamic systems of detection. Further, we will empower users by embedding these open source tools within strategies of democratic counter-speech and community-based care and response. Project results will be shared broadly through open access white papers, publications and other online materials with policy, academic, industry, community and public stakeholders. This project will engage and train the next generation of interdisciplinary scholars-crucial to the development of responsible AI.

With its focus on robust AI methods for tackling online abuse in an effective and legally-compliant manner to the vigour of democratic societies, this research has wide-ranging implications and relevance for Canada and the UK.

Planned Impact

Main Beneficiaries:

1) The public: The prevalence of cyber abuse has lead to many government and industry attempts to curb its occurrence through prevention and policy; however, these attempts are hindered by the massive, dynamic volume of online content, as well as impeded by the largely ineffective and time-consuming nature of current abuse moderation methods. The project seeks to address these challenges while also considering issues of content moderation biases that tend to disproportionately tag certain individuals' and communities' language as toxic. These biases affect public dialogue, democratic participation and certain legal rights, such as freedom of expression, equality and privacy rights.

2) Policy makers and NGOs: The results generated by this project will help policymakers (e.g, economic diversification and innovation, justice, privacy, gender and equality) and NGO/community stakeholders (e.g., Amnesty, Reporters without Borders) establish guidelines for addressing online abusive language and inform them of the impacts. It will also provide alternative responsible (effective, unbiased and fair) methods for countering abusive language. Research results will contribute to a more balanced and democratic moderation of political dialogue and engagement while protecting against abuse of politicians and users.

3) Technology companies: Companies such as Intel are seeking to work with academics and NGOs to address abuse-prevention, especially as policies and regulatory frameworks are being developed. Gaming is also an important site for the tech industry, with a >4% yearly growth globally. The community of gamers is growing more diverse (~50% women in Canada in 2018). However, gaming can be a very toxic environment in terms of sexism, racism and other discriminatory forms of abuse, which ultimately limits the size of the gaming market.

4) Law enforcement agencies and social media companies: The responsible NLP methods
arising from this project could be incorporated in existing tools, helping law enforcement agencies and
social media companies detect and counter online abuse in real time.

5) Media companies and stakeholders engagement: Through previous projects, we have already established and will leverage collaborations with Buzzfeed, BBC News, ITV, Reuters Institute for the Study of Journalism and Google; and promote research results through the Centre for Freedom of the Media/UNESCO Journalism Safety Research Network.

6) Early career researchers (ECR)/students: the project will help advance emerging scholars' research trajectories by offering training in interdisciplinary research skills, widening collaborations in the UK, Canada, and the USA, and engaging them in cutting-edge research methods with major social impacts and benefits.

Impact and Outreach Activities:

To achieve maximum impact, project results will be made open-source. Project results will contribute to more responsible AI methods to detect online abusive language. This in turn contributes to increased users' confidence through platforms' greater compliance with relevant policies, human rights and legal frameworks and reinforces key socio-economic and Digital Economy areas, namely online gaming, social platform companies, digital journalism and content moderation technologies and services.

Policy impact will result from knowledge shared in Canada, the UK, and the US (through AI NOW). We will draw on the UK PI's experience who has just submitted written evidence on online abuse of UK MPs to the UK Parliamentary inquiry on Democracy, free speech and freedom of association and harness the Industrial and Parliament Trust. The Canada PI will share new findings with a network of over 35 collaborating scholars and policy/community/industry partners with the Canada 150 Research Chair/SFU Digital Democracy Group.

Publications

10 25 50
 
Title GATE Hate for politics 
Description A service that tags abusive utterances in any text. It includes a feature, "type", indicating the type of abuse if any, such as sexist, racist etc, and a "target" feature that indicates if the abuse was aimed at the addressee or some other party. This can be run on any English language text. It will also tag UK members of parliament for the 2015, 2017 and 2019 general elections, and candidates for the 2017 and 2019 elections. Where an individual has run for election or been elected multiple times, multiple "Politician" annotations will appear with different "minorType" features. In this way, a person's recent political career can be tracked. The current parliament is the 58th parliament, with previous parliaments counting down, so that MPs with a minorType feature of "mp55" are those that were MPs before the general election in 2015. The service will also tag a range of politically relevant topics, as well as entities such as persons, locations and organizations and Twitter entities such as hashtags and user mentions. It is designed to run on tweets in the original Twitter JSON input format, on which it will also produce metadata such as whether the tweet is a reply or a retweet. Upload your own or harvest some with our Twitter Collector. However it can be run on any text. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Researchers from KCL have been using this service to identify abusive posts on Twitter. Also SFU researchers in Canada from the Digital Democracies Institute. 
URL https://cloud.gate.ac.uk/shopfront/displayItem/gate-hate
 
Title Offensive Language Classifier 
Description This classifier is a fine-tuned Roberta-base model using the simpletransformers toolkit. We use the OLIDv1 dataset from OffensEval 2019 as training data. This dataset contains tweets classified as offensive or non-offensive. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact We have only just made this available to other researchers, so impact information will be provided in the next round of Research Fish reporting. 
URL https://cloud.gate.ac.uk/shopfront/displayItem/offensive-classifier
 
Title Toxic Language Classifier 
Description This classifier is a fine-tuned Roberta-base model using the simpletransformers toolkit. We use the Kaggle Toxic Comments Challenge dataset as training data. This dataset contains Wikipedia comments classified as toxic or non-toxic. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact This tool has just been released to the research community. Usage and impact will be reported in the next Research Fish round. 
URL https://cloud.gate.ac.uk/shopfront/displayItem/toxic-classifier
 
Description Digital Democracies institute, Simon Fraser University 
Organisation Simon Fraser University
Department Digital Democracies Institute
Country Canada 
Sector Academic/University 
PI Contribution We are training the DDI researchers in using NLP tools for analysing online abuse.
Collaborator Contribution They are providing social science expertise to produce joint collaborative research.
Impact Work is still in progress, as the project started recently. This is a multi-disciplinary collaboration involving social and computer scientists.
Start Year 2020