Overseas travel grant - Melbourne

Lead Research Organisation: UNIVERSITY OF EXETER
Department Name: Computer Science

Abstract

The proposed research forms part of a wider research programme, the goal of which is to bring together previous work on (1) modelling and understanding collective phenomena and (2) the ontology of causal relations, in order to develop a general computational theory of collective causality which can be applied to the analysis of large spatio-temporal data-sets, such as are becoming increasingly available in biology, medicine, and the social sciences.

Recent joint work with Dr Matt Duckham and others on mining candidate causal relationships from a data-set relating to fish movement made use of parts of the theory of causal relations developed by the PI and presented at the international conference FOIS 2012. A notable feature of this theory is that it takes seriously the distinction between events, processes, and states, recognising that the nature and functioning of a causal or causal-like relation can depend critically on which of these types of item it applies to - so that, for example, there is an important difference between the case in which one discrete event causes another such event, and the case in which one on-going process perpetuates another such process. The work with Duckham et al did not make use of the full range of causal relations identified in the FOIS paper, and in particular neglected process-process interactions and the effect of granularity on the descriptions of causes. One of the goals of the proposed research is to make good on this and develop the data-mining methods further with the use of additional data-sets and more detailed theoretical analysis. We are particularly interested in developing techniques to identify the signatures of genuine causal effects in the data and thereby discriminate them from chance correlations and other kinds of relationships. Such techniques, if available, would have an enormous impact on our ability to interpret data-sets collected from large, dynamic populations, whether of humans or other animals.

This work will apply to data-sets which typically cover large groups of individuals forming collectives which may exhibit varying degrees of coherence. According to the Three-Level Analysis of collective motion previously developed by the PI together with Dr Zena Wood (Exeter), the motion of such collectives can be described at three different levels of granularity, by which we may distinguish the movement of the collective as a whole, considered as a unified point-like entity, the changes in configuration of the collective, as determined by the relative positions of its members, and the movements of the individuals constituting the collective. Distinguishing these levels helps to focus attention on different kinds of causal interaction that may be manifested by the collective; in particular the causal influences on the motion of an individual within the collective might arise from the internal dynamics of the individual itself, from amongst other individuals in the collective, from the collective as a whole (which might, for example, exert some kind of pressure to conform to a group norm), and from outside the collective. Thus individual causal relations may act within one granularity level or between levels, and another part of the proposed research is to integrate the three-level analysis into the data-mining techniques to enable us to discern causal relations operating at different levels of granularity corresponding to different types of qualitative description of the collective behaviour.

Planned Impact

The development of a robust theory of collective causality will have implications for the ongoing debate within the "Big Data'" community concerning the "correlation vs causation" question, and thereby the kinds of analytic tools that should be developed within the community. A key problem concerns the extent to which analysis of data can proceed autonomously, with minimum interaction from humans with preconceived ideas about what might be interesting or significant or what effects to look for. Any substantial progress towards resolving these issues would be likely to find immediate application in the business world.

Health and safety is another important area where improved understanding of how causal relations manifest themselves in computationally discoverable patterns in data could prove to be of enormous benefit. Currently we have a good deal of data relating to spatial distribution of people and specific health and safety issues, on which consequential decisions might be based, but there remains considerable uncertainty as to how best to use it. Better models of the causal relations underlying observed patterns of data could lead to better decision making, with implications for population health, the potential impact of epidemics, as well as handling crowd behaviour in emergency situations.

Publications

10 25 50
 
Description We devised algorithms to search for causal regularities in data. Instead of just looking for correlations in the data we hypothesised the likely forms of any causal rules, paying particular to the different roles played by states, processes and events in causality, and likely temporal delay factors, and targeted the search to look for rules of those forms. The algorithm performed very well when run with synthetic data generated using rules of the appropriate forms; but we also tried it out on real data that had been collected for another purpose (outside this project) to investigate the environmental triggers for fish movement in the Murray River system in SE Australia. We found that for each environmental variable selected the algorithm was able to identify rules that could explain approximately 20% of the detected long-range movements of fish over a period of six years in terms of changes in that variable.

More recently, a longer paper has been written expanding ideas from Galton's paper at FOIS 2012 which was one of the original inspirations for this project, including some insights arising from the project itself. This paper is currently (March 2017) under review.
Exploitation Route The idea of targetting causal rules of a specified form offers a promising and attractive alternative to simply measuring correlations in the data, and could result in a more focused approach to mining big data.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections