Detox: Human-led AI to automate and radically improve online content moderation

Lead Participant: REWIRE ONLINE LIMITED

Abstract

In this project, Rewire Online Limited will develop and commercially trial a new Artificial Intelligence (AI)-powered product, _Detox_, which massively improve show platforms moderate online content. Detox will automatically detect and rate toxic content, giving accurate, reliable and informative labels to large volumes in a short space of time. This technology has the potential to create a step-change in how toxic online content is tackled, dramatically improving people's safety online and reducing the amount of risky work undertaken by human moderators. It will also alleviate commercial, legal, reputational, ethical and cost pressures on platforms, which are nearly all struggling to deal with malicious and unwanted content.

Most platforms still rely heavily on human analysts to moderate toxic content because no provider has developed a system which can automatically give accurate ratings. Most existing commercial solutions are not trusted, lacking precision, coverage and quickly going out of date. Our AI technology is different from current offerings because it is human-led. It dynamically integrates data collection with model training, which enables substantial performance improvements whilst decreasing the time and cost of product development. With a traditional approach to AI, analysts just label toxic content to create a labelled training dataset. In contrast, our analysts are tasked with creating adversarial content which they think will 'trick' the AI. They do this by creating content which the AI thinks is toxic but actually is not, and vice versa. In this way the annotators identify and exploit the AI's weaknesses. We then update the AI, iteratively repeating the process. This technology has been developed in a research context, and is now ready to be trialled commercially. Detox addresses the growing need for high-quality and cost-effective automated content moderation tools that _actually work_. It presents a genuine breakthrough in the automatic moderation of online content, delivering on the huge but unfulfilled promise of AI in this field. This project will create numerous social, economic and innovation benefits, placing the UK at the forefront of innovative solutions to ensure online safety.

Lead Participant

Project Cost

Grant Offer

REWIRE ONLINE LIMITED £297,598 £ 208,319

Publications

10 25 50