Shear: The next generation of video understanding technology to automate content moderation across the internet.

Lead Participant: Unitary Ltd

Abstract

In this project, Unitary Ltd and Oxford University will develop novel algorithms to address the core challenges of video moderation. This technology will form Unitary's new product, _Shear_, to automatically detect harmful video content online.

Automated moderation is desperately needed to ensure both speed and accuracy, and protect moderators' mental health. Current solutions treat each video as a series of frames and apply image analysis. The audio is analysed separately to detect keywords. But any understanding of time (the order of frames), or awareness of context, is lost. **Videos carry fundamentally more information than images, and consequently there is an enormous volume of harmful videos for which this approach completely fails.**

Below are some types and examples of videos which are currently impossible to detect with automated means:

1. Videos in which understanding **interactions** is essential

E.g., an individual frame containing a gun would not necessarily give away whether this involves a real-life massacre, computer game or movie scene.

2\. Videos which require an understanding of **motion** or awareness of time

Videos depicting animal cruelty are unfortunately common. In one example, a dog is seen next to a man holding a baseball bat. The bat swings, the screen goes dark and a horrible crunch is heard. This is an extremely disturbing video, but no individual frame can raise alarm.

3\. Videos in which **multiple** **signals** must be interpreted **together**

Videos designed to influence and harm children often include popular cartoons which have been manipulated so that the characters ask the audience (i.e. children) to do dangerous things, such as "Turn the oven on" or to play with electric wires/sockets. The images alone show nothing but familiar cartoons, and the audio itself is not cause for concern -- there is no profanity, and in fact it might be mistaken for an adult's DIY video! But the combination of this audio inside a cartoon is what makes it unacceptable.

4\. Videos in which **context** is key

Visually similar content can be harmful or benign depending on other factors: e.g. a nude portrait could be posted alongside a feminist message or narration by a sexist troll.

This project will result in breakthrough technology that can interpret a variety of signals to enhance understanding of time and context, enabling improved detection of videos such as those described above. We aim to disrupt the moderation industry, one which is currently extremely manual and ripe for innovation.

Lead Participant

Project Cost

Grant Offer

 

Participant

Unitary Ltd
University of Oxford £120,000 £ 120,000

People

ORCID iD

Publications

10 25 50