IDEAS Factory - Detecting Terrorist Activities: Making Sense

Lead Research Organisation: Imperial College London
Department Name: Dept of Computing

Abstract

The key challenge that this proposal addresses is the analysis and visualization of multiple sources of multi-modal data that may be partial, unreliable and contradictory. In addressing this challenge we expect to create an interactive visualization-based decision support assistant which collects data, fuses it, analyses it and visualizes the results in a way which can be shared by analysts. Each aspect of the assistant poses significant new scientific challenges. We envisage a systems architecture which has four main components:Collection of data encompasses automated approaches to gist-ing multimedia content (extraction of the gist of the information -- a process of abstraction which might, for example, summarise a phone message by the caller phone number, the phone number of the recipient, its duration, key words and phrases which stand out as most significant, and potentially suspicious words and phrases) and data management and resource allocation issues. Fusion and inference involves the integration of different modalities of data of variable reliability, the estimation of missing data for use in scenario development, methods for the resolution of contradictions, and psychological studies of how analysts relate information in this setting. We will investigate whether work on GIS which has developed meta-languages such as UnCertML can be used in this process.Analysis involves further summarizing of the fused data; we will build on existing machine learning techniques but anticipate that the characteristics of the data (which include temporal and spatial information as well as the uncertainty aspects discussed above) will pose significant new challenges. A particular challenge which will require psychological input is the drawing of relevant connections that have arisen in the fused data. Visualization must be informed by the operational model(s) of the data analyst(s), risk assessment and by legal considerations. The key challenge is to find a flexible, interactive way of visualising the data that allows the analyst to query the data and focus attention in a natural way. A key aspect of the visualization system is that it forms both the input and output of the system allowing the complexity of the data and underlying system to be hidden beneath an intuitive interface.Visual analytics is the emerging science of making sense of large data sets, that through the use of interactive visualization and query, supports the analytic reasoning process. The interactive visualization interface will support sense-making, query formulation, and information search by showing in visual representations associations and relationships between large, mixed-format and loosely-coupled data sets, such as un-structured reports, news feeds, photos, and structured databases. Very importantly as well, there is also a need for an intermediate layer that may draw on a variety of computing technologies e.g. Latent Semantic Analysis, ontologies) to enable the extraction of semantically meaningful relationships between data sets. In addition, by enabling changes in viewing perspectives (e.g. rotation, re-ordering, re-collating) it will facilitate the chance discovery of un-anticipated associations and resources. One of the original features of this proposal is that the system developed will be based on an analysis of user requirements undertaken by psychologists in the team. Psychological research into the behaviour of analysts has been undertaken elsewhere, but we believe this is the first research project that directly integrates psychological findings from research with analysts, into the development of a decision support system.

Planned Impact

This proposal arose from the DTAct sandpit (11-15 May 2009). The objectives and system architecture were directly influenced by discussions with stakeholders at that meeting and with subsequent interactions coordinated through the CPNI. The primary communities (or users) who would benefit from this research will include law enforcement agencies such as Police, and Customs and Excise, and the intelligence operations of the military and other security services. The characteristics of the problem that we are addressing - large, multi-format, multi-sourced data which may be of varying uncertainty and completeness - are such that the science and technology we develop can also be of benefit to users such as epidemiologists, health protection agencies, pharmaceutical companies and businesses. However the needs of this latter group of users will not be the focus of our research. The report on the 7/7 bombings by the Intelligence and Security Committee highlights the critical features of intelligence data: it is of mixed reliability; the sources are of variable credibility; it is fragmented and may only give a partial picture; there are massive amounts of data that might be pertinent. Although heavily censored, the report gives a good idea of the scale of the problems faced by the agencies involved: the footnote on page 13 of the report quotes the head of MI5 in describing the needle in the haystack nature of the problem. Our proposal is to produce a system that will assist agencies in analysing this deluge of data, and in making decisions based on this analysis. Key features of our approach will involve new approaches to the automated gisting of data, dealing with the uncertainty and inconsistency of different sources, extraction of relationships within the fused data ( joining the relevant dots ) and interactive visualisation of the results. The major benefit will be that analysts will be better supported in how they organise and reason with information. Consequently, the agencies will be able to make much better-informed decisions which should ultimately lead to enhanced CT effectiveness. We expect to produce prototypes of parts of the system throughout the project lifetime. These will be available for immediate testing in conjunction with existing systems. An additional benefit may be that these prototypes encourage greater inter-operability between existing systems. The scientific results of the project will be published in the normal way and will thus be available to the broader community of potential beneficiaries through this route. Staff employed on the project will develop application-specific skills but will also have generic skills within the area that they are employed in (for example Forensic Computing, Visual Analytics, Computational Linguistics and Systems Engineering). A steering group including representatives of key stakeholders will be set up on project initiation. In addition, we are planning regular meetings with the DTAct stakeholders throughout the lifetime of the project. This will include more intensive collaboration on particular work packages and also plenary meetings. We will communicate with the broader user community via participation in scientific workshops and meetings and publications (both academic and appropriate end-user publications) and, if it is considered appropriate, through a project website.

Publications

10 25 50
publication icon
Gilbert A (2011) Action recognition using mined hierarchical compound features. in IEEE transactions on pattern analysis and machine intelligence

publication icon
Gilbert A (2017) Image and video mining through online learning in Computer Vision and Image Understanding

publication icon
Martelot E (2013) Multi-scale community detection using stability optimisation in International Journal of Web Based Communities

publication icon
Vigliotti M (2015) Discovery of anomalous behaviour in temporal networks in Social Networks

 
Description We developed a series of algorithms for analysing data represented by graphs. These included fast community detection: this involves partitioning a graph into collections of nodes which are tightly connected relative to nodes which are outside the community; anomaly detection techniques; and detecting influence and emerging influence.
Exploitation Route Our work has led to the two contracts with Dstl. These have focussed on testing the hypothesis that social media could be used as an early warning mechanism for disease outbreaks. The tools that we have developed are being deployed in a Bio-surveillance Eco-system. We have demonstrated that similar techniques can also be used for sentiment analysis and building narratives about why a portion of the population might be expressing particular sentiments.
Sectors Digital/Communication/Information Technologies (including Software),Security and Diplomacy

 
Description Bio-Surveillance Environment
Amount £250,000 (GBP)
Organisation Defence Science & Technology Laboratory (DSTL) 
Sector Public
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 10/2013 
End 09/2014
 
Description Bio-Surveillance Environment
Amount £500,000 (GBP)
Organisation Defence Science & Technology Laboratory (DSTL) 
Sector Public
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 03/2016 
End 02/2018