RAnDMS (Real time Analysis of Digital Media Streams)

Lead Research Organisation: University of Sheffield
Department Name: Computer Science

Abstract

RAnDMS will study, implement and evaluate Real-time Data and Visual Analytic techniques to enable intelligence agencies, the MoD, the police and emergency responders to monitor and make sense of local, regional and global events using web-scale data from social and traditional media streams. The intelligence gathering task will be defined as identifying, correlating, integrating and presenting data and information, in order to understand situations as they arise. Current technology does not provide efficient and effective solutions, as it mainly focuses on detecting trends in the use of keywords and tags. While this is able to spot overall patterns in the data, it just enables the retrieval of relevant documents, without any correlation and integration of the contained information. Moreover, information concerning local situations and events, which may only be discussed within a handful of documents, is ignored.
Within RAnDMS data analytics will focus on enabling the capture of information from media streams; illuminating situations at all levels, from global to local. This information will support decision making for the intelligence community, which is expected to increase their ability to monitor events and situations relevant to homeland security and to peace-keeping efforts. The scientific challenge is that data and information in these streams are: (i) high in volume, and constantly increasing, (ii) often duplicated, incomplete, imprecise and incorrect; (iii) written in informal style (i.e. short, unedited and conversational); and (iv) generally concerning the short-term zeitgeist. These characteristics make analysis very hard, especially when considering that major requirements of the intelligence community are that (i) documents must be processed in real-time and (ii) the relevant information may be in the long-tail of the distribution, i.e. it may be mentioned very infrequently.
We will provide highly efficient and effective technologies able to associate each document with its context. A documents context is provided by four dimensions: (who) the author of the document, (when) the time it was sent, (where) the location referred to in the document and (what) other documents with similar content. This information is either provided by the media stream or extracted from the document's content using efficient statistical text-mining techniques. By interpreting documents in terms of these four dimensions we enable: (i) the detection of events, i.e. documents and their content (what) are clustered around a time and place; (ii) the profiling of authors from the content (what and where) of the documents they have created; and (iii) determine information that is missing or ambiguous in document, using information present in the documents within their context.
Visual analytics will facilitate the exploration of the information by providing multiple views; enabling focused investigation and trend visualisations across the four dimensions. We will devise methods to (i) suggest the right level of detail (granularity) for the user focus in rapidly changing environments; (ii) alert users to any significant development outside of their current viewpoint; and (iii) enable users to understand how the current state of affairs came into being by browsing along the all information along the time dimension. Methods will en able to see through the irrelevant banter (noise) that often surround events in social media and go directly to the relevant information that can be hidden in the long tail of the distribution.
RAnDMS will be tested on the task of supporting intelligence operators during relevant events happening during 2012/13. We will publish the research and its findings in international journal and conferences. Subject to MoD agreement, we will also create public research resources by generating one publicly available task (inclusive of corpora, resources, etc.) to enable comparison of research results by other researchers.

Planned Impact

The project will have an impact on industry, the government and society.

The main target of dissemination and exploitation will be companies working in the fields of intelligence/anti-terrorism, such as Ultra Electronics (one of the major providers of the UK government) both in the UK and abroad. These companies tend to have a traditional focus and are not currently working on solutions involving the use of the social Web to help make sense of events. This is despite the large interest created at an international level by events such as the Arab Spring and Hurricane Katrina and the large interest in the intelligence and defence sector, as demonstrated by some recent NATO reports, especially of US origin.

We will build on technology and know-how developed as part of the WeKnowIt and TRIDS projects, moving towards a level of complexity that currently is not possible to reach with that technology, as some fundamental research problems must be addressed. We will develop the science and transfer some intermediate results on top of the baseline technology. Moreover, we will create demonstrators of capabilities that are made possible by the research but that will need more investments (e.g. via TSB or MoD funded projects) to create actual applications.

It is expected that the technologies will be industrialised and applied in subsequent projects through collaboration with companies. Exemplars of datasets will be made available for benchmarking and evolving technology in this field and for future experiments. We have a long and successful experience in collaborating with industry and we expect this to continue with this project.

As for the government, the science and technology developed in the project will provide a feasibility study in the use of social Web monitoring for intelligence. The technology developed in the project will enable a deep insight into crises and emergencies nationally and abroad. Recent events, both local (e.g. the Aug 2011 UK riots), but especially abroad (e.g. during the Arab Spring) have shown that Western governments struggled to make sense of the events as they were missing direct information from the ground. The technology proposed in this project will enable harvesting of information directly from the ground, via the monitoring of social media and news feeds. We plan to test the developed technology on corpora collected during emergencies and crises which develop during 2012. This will show the way to how future emergencies and crises could be managed, hence creating an interest in the government on the exploitation of the results. DSTL will be key to this plan: We plan to spend several weeks at DSTL to derive requirements, to co-create the science and the technology and finally to test the results in real world scenarios with real users. The time frame for exploitation of results will be 1-5 years after the project end, when products based on the developed technology will be available.

Finally citizens and society will benefit from the results of the project in an indirect way through improved security nationally and internationally. More directly, the ability to gain intelligence through the use and analysis of social media may be reused by the ordinary citizen in, for example, education and to encourage civic engagement.

Publications

10 25 50
 
Description We developed technologies for large scale analysis of events in social media. These algorithms allow an emergency control room to monitor any event through the eyes of the citizens via social media. The algorithms allows both the automated or semi-automated recognition of events in the text of the messages and the effective visualisation of results so to allow a user to identify relevant events and to track the event over time
Exploitation Route The technology has been used by the control rooms of events involving over a million people, including the Glastonbury Festival for two consecutive years. A licence has been bought by Knowledge Now Ltd. They are now making a business of the results of this project.
Sectors Environment,Leisure Activities, including Sports, Recreation and Tourism,Security and Diplomacy

 
Description The developed technology enabled to monitor social media over large scale. It was used by the emergency control rooms of a number of large scale events during 2013-2014. During that period, we supported monitoring of events involving over a million people. These events include (among others): the Glastonbury Festival (2013 and 2014 editions, 200,000 participants per event), the Bristol Harbour Festival (20130-2014 editions (250,000 per event), St. Paul Carnival in Bristol (70,000 participants). Moreover, the technology was used by the Italian Civil Protection to support the evacuation of 30,000 people from the city of Vicenza to allow the defuse of a world war II bomb. The same civil protection has asked us to support several exercises in preparation of floods; we expect that they will use the technology for any big future emergency event. A licence for the technology to support the action of an European police was bought by Knowledge Now Ltd.
First Year Of Impact 2013
Sector Environment,Security and Diplomacy,Other
Impact Types Societal,Economic

 
Description European Union Framework 7
Amount € 1,200,000 (EUR)
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 09/2012 
End 09/2016
 
Description Football Whisper - Mining the Web for Football Player Transfer News
Amount £100,000 (GBP)
Organisation Klood 
Sector Private
Country United Kingdom
Start 11/2015 
End 12/2017
 
Description Collaboration with Knowledge Know Ltd 
Organisation Royal Netherlands Meteorological Institute
Country Netherlands 
Sector Academic/University 
PI Contribution We developed the technology that was used in events involving over a million people over 2 years.
Collaborator Contribution Knowledge Now participated initially in the definition of the requirements and then assisted during many of these events providing personnel for the control room. Finally they have bought a licence that they used in an application with a European police. Since then Knowledge Now have built their own system that they are now commercialising worldwide.
Impact see above about their commercialisation aiams
Start Year 2013
 
Description Football Whispers 
Organisation Football Whispers Ltd
Country United Kingdom 
Sector Private 
PI Contribution Football Whispers are a company providing information on rumours about football to both enthusiasts and professionals (e.g. television networks). It is a new venture that has adopted part of the Lodie technologies (and part of technologies developed as part of the Randms and Redites EPSRC projects to analyse millions of messages from social media (e.g. Twitter). The are now online and having thousands of daily visitors to their web site.
Collaborator Contribution They have provided strict requirements and preexisting knowledge about football, as well as connection to huge levels of pay for data.
Impact The output is their own product that is largely provided by our social media analysis technology. We are now in the process of IP release discussion also for fields different from Football. The Ip rights are likely to be in the hundreds of thousands of pounds and shares in their company
Start Year 2015
 
Title TRIDS Infrastructure for Social Media Analysis 
Description The system enables the large scale analysis of social media for event management by emergency control rooms. 
Type Of Technology Webtool/Application 
Year Produced 2013 
Impact Already written at least three times in the other forms. We supported emergency control rooms in events involving over a million people, including the Glastonbury Festival in 2013 and 2014 
 
Description Talks in Schools as part of the STEM Ambassador Scheme and as part of Speakers for Schools Scheme, an initiative that provides "free talks in state schools by distinguished and eminent people, including leaders in business, the arts, sciences, sport, politics and media". 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact I am giving two talks centred around the topics developed in this project: "Are you preparing for a Digital World?"
and "Jets, Music Festivals, and Flying Drones. And you thought computing was boring?"
Year(s) Of Engagement Activity 2013,2014,2015,2016
URL http://staffwww.dcs.shef.ac.uk/people/F.Ciravegna/Fabio_Ciravegna/Outreach.html