Expanding and Assembling Approaches to Improve Decisions on Identification and Classification of Online Terrorist Content

Lead Research Organisation: Swansea University

Department Name: College of Science

Abstract

Improving knowledge of online terrorist ecosystems is urgently needed to develop counter measures effectively and responsibly. The aim of this project is to develop new methods for predicting terrorist and extremist behaviour on the Internet. Accurate models of behavioural patterns will allow the prediction
of communication channels and make content discovery (and removal) strategies more effective by expanding technological approaches deployed (including managing discovery- /open-source intelligence and interacting with platforms to improve decisions via machine learning) and scoping opportunities to combine approaches into ensemble models, such as combining predictive algorithms.

This PhD project aims to apply state of the art and novel algorithmic techniques to classify data, and behaviours related to extremism by developing human-centred processes to clean bias and noise from the data that will be collected from a range of social media platforms. To achieve our aims, firstly, we will use the TCAP platform that will provide us support on getting access to collecting data from a range of social media platform in different media forms (PDF, URL,
HTML, audio and videos). The raw data will be converted into a workable dataset such as SQLite, or csv file format. Secondly, to remove noise and bias from the data, we will facilitate a user-centred design process in which we will develop an interactive process that will enable extremist domain experts to perform complex text extraction tasks at scale, as described in [5,7]. The tool will enable users to remove noise in quantifiable ways which will consequently allow us to squeeze the feedback loop between the cleaning process and the user. Thirdly, we will apply novel algorithmic techniques to classify data, and behaviours related to extremism. This will include ensemble methods and/or deep learning techniques. An approach can be based on sentiment-based deep learning models (LSTM+
CNN) to classify extremist and non-extremist content [8]. We will apply the process on a closed set of data in the following way. We will focus on analyzing the past occurrences where terrorist organisations have used different platforms to spread propaganda. This project will select one of the recent historical terrorist attacks and will analyze the use of social media by terrorist across these four stages of the attacks. Here, we can use the existing data (from TCAP) during the four different time frames of an attack. In the first step, the student will generate a dataset around user interactions during the four stages. In the second stage, an interactive user centered process will be followed that will help clean the data by involving domain experts. Note, we will consider factors to avoid overfitting during this process. In the third and final step, we will apply state-of-the-art machine learning models such as sentiment-based deep learning, and/or ensemble models to develop classifiers for identifying the type of extremist contents. User-studies will be conducted at major milestones to rate the decision
support provided by the machine learning models and to ensure the decisions are justifiable and do not violate the principle of freedom of speech.

In summary, the following contributions are anticipated. An interactive, user-centered tool that enables extremism domain experts to assist with data cleaning and removing bias. A range of classifiers to predict the class of different type of extremist contents on a dataset which will be made available publicly to inform future research. The project will investigate the effectiveness of classification for the purpose of content restriction focused on terrorist activity and propaganda. The methodology and technology developed will be scrutinised for its possible unintended use, such attacks on democracy, undue influencing of political decisions, and other fraudulent behaviour, or deliberate introduction of biases into the social media landscape.

Planned Impact

The Centre will nurture 55 new PhD researchers who will be highly sought after in technology companies and application sectors where data and intelligence based systems are being developed and deployed. We expect that our graduates will be nationally in demand for two reasons: firstly, their training occurs in a vibrant and unique environment exposing them to challenging domains and contexts (that provide stretch, ambition and adventure to their projects and capabilities); and, secondly, because of the particular emphasis the Centre will put on people-first approaches. As one of the Google AI leads, Fei-Fei Li, recently put it, "We also want to make technology that makes humans' lives better, our world safer, our lives more productive and better. All this requires a layer of human-level communication and collaboration" [1]. We also expect substantial and attractive opportunities for the CDT's graduates to establish their careers in the Internet Coast region (Swansea Bay City Deal) and Wales. This demand will dovetail well with the lifetime of the Centre and provide momentum for its continuation after the initial EPSRC investment.

With the skills being honed in the Centre, the UK will gain a important competitive advantage which will be a strong talent based-pull, drawing in industrial investment to the UK as the recognition of and demand for human-centred interactions and collaborations with data and intelligence multiplies. Further, those graduates who wish to develop their careers in the academy will be a distinct and needed complement to the likely increased UK community of researchers in AI and big data, bringing both an ability to lead insights and innovation in core computer science (e.g., in HCI or formal methods) allied to talents to shape and challenge their research agenda through a lens that is human-centred and that involves cross-disciplinarity and co-creation.

The PhD training will be the responsibility of a team which includes research leaders in the application of big data and AI in important UK growth sectors - from health and well being to smart manufacturing - that will help the nation achieve a positive and productive economy. Our graduates will tackle impactful challenges during their training and be ready to contribute to nationally important areas from the moment they begin the next steps of their careers. Impact will be further embedded in the training programme with cohorts involved in projects that directly involve communities and stakeholders within our rich innovation ecology in Swansea and the Bay region who will co-create research and participate in deployments, trials and evaluations.

The Centre will also impact by providing evidence of and methods for integrating human-centred approaches within areas of computational science and engineering that have yet to fully exploit their value: for example, while process modelling and verification might seem much removed from the human interface, we will adapt and apply methods from human-computer interaction, one of our Centre's strengths, to develop research questions, prototyping apparatus and evaluations for such specialisms. These valuable new methodologies, embodied in our graduates, will impact on the processes adopted by a wide range of organisations we engage with and who our graduates join.

Finally, as our work is fully focused on putting the human first in big data and intelligent systems contexts, we expect to make a positive contribution to society's understandings of and involvement with these keystone technologies. We hope to reassure, encourage and empower our fellow citizens, and those globally, that in a world of "smart" technology, the most important ingredient is the human experience in all its smartness, glory, despair, joy and even mundanity.

[1] https://www.technologyreview.com/s/609060/put-humans-at-the-center-of-ai/

Student:

Adam Cook

Period of Study:

Oct 20 - Sep 24

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2440634

Research Topic:

Unclassified

Organisations

People	ORCID iD
Berndt Muller (Primary Supervisor)
Matt Jones (Primary Supervisor)
Adam Cook (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S021892/1			01/04/2019	30/09/2027
2440634	Studentship	EP/S021892/1	01/10/2020	30/09/2024	Adam Cook