HateSpotting - Hybrid Sentiment Analysis to capture Hate Speech. Using Applied Intelligent Algorithms to spot localization of Hate-Speech bias across

Lead Research Organisation: Brunel University London
Department Name: Computer Science

Abstract

Social Media interactivity has led to a staggering amount of data being generated by the minute. Under the pretext of freedom of speech, some users have been given access to a whole new level of armament to harm to other people or victims behind the shield of a computer screen. This project aims to develop novel machine learning algorithms to spot, track and quantify slurs, hate-based negative and bigoted speech (based on biased information).
"Intelligent" algorithms will be developed and applied to captured content data and after, to catch trends or localization of hate-speak. Current research has so far focused on the fact that part of the challenge is that, at present, the data, tools, processes and systems needed to effectively and accurately monitor online abuse are not fully available and the field is beset with terminological, methodological, legal and theoretical challenges.
To realise this aim/goal the following objectives are defined:
1. Conduct a detailed literature review on the current social and algorithmic advances in the identification and tracking of online hate-speech. Results from this part of the project will be presented as a systematic literature and mapping study, if appropriate.
2) Identify potential existing data sources for the research. There are numerous Twitter and other social media datasets in existence, these need locating, collating and evaluating to see if they are appropriate for the research. In parallel, web scraping technologies will be investigated as a means for automatically creating a novel dataset, as a contingency plan, if no existing datasets are found.
3) The data collected (existing and/or scraped) will be pre-processed into features that represent the terminology and ontology of domain specific hate-speech.
4) Classic machine learning techniques, such as classification and data clustering are to be applied to the identified datasets to form a gold-standard for the project evaluation. Deep learning neural networks will be looked at in more detail as this type of technology is deemed to be highly appropriate for the project.
5) The bulk of the research will look at extending work on trajectory based Bayesian Networks and Sequential Pattern Mining techniques for predicting hate-speech paths within forum based social media discussions. Current research in these fields has not been extended to the longitudinal nature of the data that will be used in this project.
Each distinct section of research will be written up initially as a paper for publication at key conferences and journals, such as the Intelligent Data Analysis symposium. These papers will then form the core chapters of the final thesis.

Publications

10 25 50