Modelling and inference for massive populations of heterogeneous point processes

Lead Research Organisation: University College London
Department Name: Statistical Science

Abstract

Increasingly, handling large volumes of very heterogeneous data sets is necessary in most application domains. The past decade has seen considerable computational and theoretical developments as a consequence of this fact, enabling us to understand these new types of large volumes of data. This field of mathematics is known as "high dimensional data analysis" where typically the models we need to understand superficially are as complex as the observations they represent. Theory has been developed for many types of models in this setting. An outstanding challenge is understanding observations that come in the form of the spatial locations of a number of points, or events, which may belong to a number of distinct groups. Such data are referred to as "point processes", where the locations of objects of interest are exactly the points. Point processes are ubiquitous in applications, for example in ecology, seismology, and astronomy, and so new methods to understand such forms of data have a clear pathway to impact.

The challenge in the high dimensional setting for point processes is developing simple and flexible models that can be understood, and characterised, within realistic sampling scenarios. To enable the characterisation of observed data, the project will build new models through considering new forms of structure that the data can possess. To incorporate realistic features, we will build models with forms of scale-based heterogeneity, but also including more complex spatial structure. For many realistic processes this includes strong spatial forms of anisotropy, namely patterns associated with given spatial directions. This project will develop such models, and the methods necessary to characterise the structure from data. Computational feasibility will be a strong constraint, as the number of spatial patterns that we will analyse simultaneously will place a clear computational burden on the analysis.

The project will construct new methods to understand data collected in forest ecology. Here the data are locations of different tree species across time, and we consider a particularly high-dimensional, rich source of data that consists of over 275,000 individual trees, belonging to 312 different species. These data exhibit patterns of spatial aggregation and segregation associated with different spatial scales, but these patterns also show anisotropy associated with explanatory variables, which may be broadly classed as having a biotic or abiotic influence. Biotic factors, such as competition for the same nutrients, typically act independently of direction, whereas abiotic, or environmental factors, can have a rotationally asymmetric influence on plant dispersal. Abiotic features, such as features of the landscape like rivers, elevation and soil type, are normally found at large spatial scales relative to that of direct interaction between individuals. This means that whilst competition may lead to segregation of individuals at small scales, they may occur in the same areas of the landscape, giving an appearance of aggregation at larger scales.

The project will thus determine how to best model strongly heterogeneous multiscale structure in forest ecology and develop the mathematics necessary to quantify their form, which is not possible with current methodology. More broadly, this project will provide a flexible set of tools, and a mathematical framework to understand highly heterogeneous and anisotropic classes of point processes.

Planned Impact

The work proposed in this project is fundamental research in statistics, and also directly impacts ecology. Statistics has an additional impact on society via collaborations and users of developed technologies. Statistical methodology underpins all study and interpretation of data, and point processes are ubiquitous in applications. Impact will follow via other statisticians who engage with other application areas.

The work in this project will be particularly relevant to research in ecology. This work is useful to solve problems associated with understanding changing ecosystems and so could benefit policy makers and government agencies responsible for environmentally sensitive ecosystems, such as Defra. In particular we plan to define new spatio-temporal summaries or indicators for multivariate data sets that quantify joint spatial structure in species abundances and potential changes to such. This is important in helping to form a notion of climate change impacts. There is good evidence that species rich communities are more productive, meaning more carbon is sequestered from the atmosphere, therefore understanding how numerous species are maintained, and whether they are vulnerable to environment is a key problem in ecology. We therefore need to understand the richness of species communities, and any potential evolution of such richness.

To ensure impact, we will take the following steps:
1) We will publish in both statistical and ecological journals, as well as on the open-access platforms, ArXiv and UCL Discovery. In addition, any code developed for the project will be formatted and published on our website.

2) Dissemination of our results will also be achieved through presentation at national and international conferences. The investigators will present results at both statistical and ecological meetings such as the RSS International Conference and meetings of the British Ecological Society (see Justification of Resources). At larger meetings, such as the Joint Statistical Meetings and the Ecological Society of America Annual Meeting, we will apply to host sessions on statistical ecology and methods for point process data. Attendance and presentation of our research at larger international meetings will ensure delivery of our methodology to a wide range of principal academic beneficiaries, as well as to end users in ecology, such as forest management bodies.

3) We will use the UCL Public Policy unit to forge new collaborations in the application of these methods, and the new UCL-Defra pilot partnership scheme.

4) The impact of our results will be further boosted by our engagement with the public communications office at UCL; through this office, we will be able to coordinate a high-level 'Lunch Hour Lecture' as well as attendance to the Bloomsbury Scientific Salon, and the University of the Third Age events to communicate our research to the public.

Publications

10 25 50

publication icon
Murrell D (2018) A global envelope test to detect non-random bursts of trait evolution in Methods in Ecology and Evolution

publication icon
Rajala T (2018) Detecting multivariate interactions in spatial point patterns with Gibbs models and variable selection in Journal of the Royal Statistical Society: Series C (Applied Statistics)

publication icon
Olhede SC (2018) The growing ubiquity of algorithms in society: implications, impacts and innovations. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

 
Description We have developed new understanding relating to the significance of discovered interactions. We have found that previously found non-interactions may be less significant.
Exploitation Route This will have an impact for monitoring ecological diversity as it permits you to study a much larger population of species than was possible before.
Sectors Environment

URL https://arxiv.org/abs/1705.00689
 
Description The findings of this grant has established that many previously discovered interactions might have been found to be non-significant by chance. This will have implications for species preservation and understanding of the rich ecosystem in the rain forest.
First Year Of Impact 2018
Sector Environment
 
Description Creative reactions project 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact This event brought science together with art to further increase the public's understanding of science. I worked with a poet as well as visual artists.
Year(s) Of Engagement Activity 2017
 
Description UCL Natural Science Club 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact This described general ways of analysing patterns in nature.
Year(s) Of Engagement Activity 2017
 
Description UCL Science Centre for Schools- mathematical patterns in nature 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact This is a UCL institution-it reaches 100s of A-level students who want to learn about science.
Year(s) Of Engagement Activity 2017
URL http://www.ucl.ac.uk/physics-astronomy/outreach/science-lectures
 
Description spoke in data session of Royal Society meeting harnessing the potential of AI in the north west 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact This meeting brought together practitioners of data science in the northwest of England with current research level activities.
Year(s) Of Engagement Activity 2017
 
Description talk at Office of national statistics 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact I gave a talk to the office of national statistics on data ethics.
Year(s) Of Engagement Activity 2017