IDEAS Factory - Detecting Terrorist Activities: Making Sense

Lead Research Organisation: Imperial College London

Department Name: Computing

Abstract

The key challenge that this proposal addresses is the analysis and visualization of multiple sources of multi-modal data that may be partial, unreliable and contradictory. In addressing this challenge we expect to create an interactive visualization-based decision support assistant which collects data, fuses it, analyses it and visualizes the results in a way which can be shared by analysts. Each aspect of the assistant poses significant new scientific challenges. We envisage a systems architecture which has four main components:Collection of data encompasses automated approaches to gist-ing multimedia content (extraction of the gist of the information -- a process of abstraction which might, for example, summarise a phone message by the caller phone number, the phone number of the recipient, its duration, key words and phrases which stand out as most significant, and potentially suspicious words and phrases) and data management and resource allocation issues. Fusion and inference involves the integration of different modalities of data of variable reliability, the estimation of missing data for use in scenario development, methods for the resolution of contradictions, and psychological studies of how analysts relate information in this setting. We will investigate whether work on GIS which has developed meta-languages such as UnCertML can be used in this process.Analysis involves further summarizing of the fused data; we will build on existing machine learning techniques but anticipate that the characteristics of the data (which include temporal and spatial information as well as the uncertainty aspects discussed above) will pose significant new challenges. A particular challenge which will require psychological input is the drawing of relevant connections that have arisen in the fused data. Visualization must be informed by the operational model(s) of the data analyst(s), risk assessment and by legal considerations. The key challenge is to find a flexible, interactive way of visualising the data that allows the analyst to query the data and focus attention in a natural way. A key aspect of the visualization system is that it forms both the input and output of the system allowing the complexity of the data and underlying system to be hidden beneath an intuitive interface.Visual analytics is the emerging science of making sense of large data sets, that through the use of interactive visualization and query, supports the analytic reasoning process. The interactive visualization interface will support sense-making, query formulation, and information search by showing in visual representations associations and relationships between large, mixed-format and loosely-coupled data sets, such as un-structured reports, news feeds, photos, and structured databases. Very importantly as well, there is also a need for an intermediate layer that may draw on a variety of computing technologies e.g. Latent Semantic Analysis, ontologies) to enable the extraction of semantically meaningful relationships between data sets. In addition, by enabling changes in viewing perspectives (e.g. rotation, re-ordering, re-collating) it will facilitate the chance discovery of un-anticipated associations and resources. One of the original features of this proposal is that the system developed will be based on an analysis of user requirements undertaken by psychologists in the team. Psychological research into the behaviour of analysts has been undertaken elsewhere, but we believe this is the first research project that directly integrates psychological findings from research with analysts, into the development of a decision support system.

Planned Impact

This proposal arose from the DTAct sandpit (11-15 May 2009). The objectives and system architecture were directly influenced by discussions with stakeholders at that meeting and with subsequent interactions coordinated through the CPNI. The primary communities (or users) who would benefit from this research will include law enforcement agencies such as Police, and Customs and Excise, and the intelligence operations of the military and other security services. The characteristics of the problem that we are addressing - large, multi-format, multi-sourced data which may be of varying uncertainty and completeness - are such that the science and technology we develop can also be of benefit to users such as epidemiologists, health protection agencies, pharmaceutical companies and businesses. However the needs of this latter group of users will not be the focus of our research. The report on the 7/7 bombings by the Intelligence and Security Committee highlights the critical features of intelligence data: it is of mixed reliability; the sources are of variable credibility; it is fragmented and may only give a partial picture; there are massive amounts of data that might be pertinent. Although heavily censored, the report gives a good idea of the scale of the problems faced by the agencies involved: the footnote on page 13 of the report quotes the head of MI5 in describing the needle in the haystack nature of the problem. Our proposal is to produce a system that will assist agencies in analysing this deluge of data, and in making decisions based on this analysis. Key features of our approach will involve new approaches to the automated gisting of data, dealing with the uncertainty and inconsistency of different sources, extraction of relationships within the fused data ( joining the relevant dots ) and interactive visualisation of the results. The major benefit will be that analysts will be better supported in how they organise and reason with information. Consequently, the agencies will be able to make much better-informed decisions which should ultimately lead to enhanced CT effectiveness. We expect to produce prototypes of parts of the system throughout the project lifetime. These will be available for immediate testing in conjunction with existing systems. An additional benefit may be that these prototypes encourage greater inter-operability between existing systems. The scientific results of the project will be published in the normal way and will thus be available to the broader community of potential beneficiaries through this route. Staff employed on the project will develop application-specific skills but will also have generic skills within the area that they are employed in (for example Forensic Computing, Visual Analytics, Computational Linguistics and Systems Engineering). A steering group including representatives of key stakeholders will be set up on project initiation. In addition, we are planning regular meetings with the DTAct stakeholders throughout the lifetime of the project. This will include more intensive collaboration on particular work packages and also plenary meetings. We will communicate with the broader user community via participation in scientific workshops and meetings and publications (both academic and appropriate end-user publications) and, if it is considered appropriate, through a project website.

Funded Value:

£2,185,136

Funded Period:

Feb 10 - Jun 13

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/H023135/1

Principal Investigator:

Chris Hankin

Research Subject:

Info. & commun. Technol. (75%)

Linguistics (25%)

Research Topic:

Comput./Corpus Linguistics (25%)

Computer Graphics & Visual. (25%)

Information & Knowledge Mgmt (50%)

Organisations

People	ORCID iD
Chris Hankin (Principal Investigator)
Margaret Ann Wilson (Co-Investigator)
Chris Rowland (Co-Investigator)
Katerina Papadaki (Co-Investigator)
Eric Atwell (Co-Investigator)	http://orcid.org/0000-0001-9395-3764
Ken McNaught (Co-Investigator)
Richard Bowden (Co-Investigator)
Peter Eachus (Co-Investigator)
Philip Palmer (Co-Investigator)
Christopher Hargreaves (Co-Investigator)
B.L. William Wong (Co-Investigator)

Publications

Author Name Title

Publication Date Published

10 25 50

Gilbert A (2011) Action recognition using mined hierarchical compound features. in IEEE transactions on pattern analysis and machine intelligence

Vigliotti M (2015) Discovery of anomalous behaviour in temporal networks in Social Networks

Le Martelot E (2014) Fast multi-scale detection of overlapping communities using local criteria in Computing

Le Martelot E (2013) Fast Multi-Scale Detection of Relevant Communities in Large-Scale Networks in The Computer Journal

Gilbert A (2017) Image and video mining through online learning in Computer Vision and Image Understanding

Martelot E (2013) Multi-scale community detection using stability optimisation in International Journal of Web Based Communities

Simmie D (2013) Ranking Twitter Influence by Combining Network Centrality and Influence Observables in an Evolutionary Model

Simmie D (2014) Ranking twitter influence by combining network centrality and influence observables in an evolutionary model in Journal of Complex Networks

Key Findings
Further Funding


Description	We developed a series of algorithms for analysing data represented by graphs. These included fast community detection: this involves partitioning a graph into collections of nodes which are tightly connected relative to nodes which are outside the community; anomaly detection techniques; and detecting influence and emerging influence.
Exploitation Route	Our work has led to the two contracts with Dstl. These have focussed on testing the hypothesis that social media could be used as an early warning mechanism for disease outbreaks. The tools that we have developed are being deployed in a Bio-surveillance Eco-system. We have demonstrated that similar techniques can also be used for sentiment analysis and building narratives about why a portion of the population might be expressing particular sentiments.
Sectors	Digital/Communication/Information Technologies (including Software),Security and Diplomacy


Description	Bio-Surveillance Environment
Amount	£500,000 (GBP)
Organisation	Defence Science & Technology Laboratory (DSTL)
Sector	Public
Country	United Kingdom
Start	03/2016
End	02/2018


Description	Bio-Surveillance Environment
Amount	£250,000 (GBP)
Organisation	Defence Science & Technology Laboratory (DSTL)
Sector	Public
Country	United Kingdom
Start	10/2013
End	09/2014


Description	UK Visual Analytics Consortium
Amount	$160,000 (USD)
Organisation	Government of the United States of America
Department	Department of Homeland Security
Sector	Public
Country	United States
Start	03/2012
End	08/2013

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications