Modelling and inference for massive populations of heterogeneous point processes
Lead Research Organisation:
University College London
Department Name: Statistical Science
Abstract
Increasingly, handling large volumes of very heterogeneous data sets is necessary in most application domains. The past decade has seen considerable computational and theoretical developments as a consequence of this fact, enabling us to understand these new types of large volumes of data. This field of mathematics is known as "high dimensional data analysis" where typically the models we need to understand superficially are as complex as the observations they represent. Theory has been developed for many types of models in this setting. An outstanding challenge is understanding observations that come in the form of the spatial locations of a number of points, or events, which may belong to a number of distinct groups. Such data are referred to as "point processes", where the locations of objects of interest are exactly the points. Point processes are ubiquitous in applications, for example in ecology, seismology, and astronomy, and so new methods to understand such forms of data have a clear pathway to impact.
The challenge in the high dimensional setting for point processes is developing simple and flexible models that can be understood, and characterised, within realistic sampling scenarios. To enable the characterisation of observed data, the project will build new models through considering new forms of structure that the data can possess. To incorporate realistic features, we will build models with forms of scale-based heterogeneity, but also including more complex spatial structure. For many realistic processes this includes strong spatial forms of anisotropy, namely patterns associated with given spatial directions. This project will develop such models, and the methods necessary to characterise the structure from data. Computational feasibility will be a strong constraint, as the number of spatial patterns that we will analyse simultaneously will place a clear computational burden on the analysis.
The project will construct new methods to understand data collected in forest ecology. Here the data are locations of different tree species across time, and we consider a particularly high-dimensional, rich source of data that consists of over 275,000 individual trees, belonging to 312 different species. These data exhibit patterns of spatial aggregation and segregation associated with different spatial scales, but these patterns also show anisotropy associated with explanatory variables, which may be broadly classed as having a biotic or abiotic influence. Biotic factors, such as competition for the same nutrients, typically act independently of direction, whereas abiotic, or environmental factors, can have a rotationally asymmetric influence on plant dispersal. Abiotic features, such as features of the landscape like rivers, elevation and soil type, are normally found at large spatial scales relative to that of direct interaction between individuals. This means that whilst competition may lead to segregation of individuals at small scales, they may occur in the same areas of the landscape, giving an appearance of aggregation at larger scales.
The project will thus determine how to best model strongly heterogeneous multiscale structure in forest ecology and develop the mathematics necessary to quantify their form, which is not possible with current methodology. More broadly, this project will provide a flexible set of tools, and a mathematical framework to understand highly heterogeneous and anisotropic classes of point processes.
The challenge in the high dimensional setting for point processes is developing simple and flexible models that can be understood, and characterised, within realistic sampling scenarios. To enable the characterisation of observed data, the project will build new models through considering new forms of structure that the data can possess. To incorporate realistic features, we will build models with forms of scale-based heterogeneity, but also including more complex spatial structure. For many realistic processes this includes strong spatial forms of anisotropy, namely patterns associated with given spatial directions. This project will develop such models, and the methods necessary to characterise the structure from data. Computational feasibility will be a strong constraint, as the number of spatial patterns that we will analyse simultaneously will place a clear computational burden on the analysis.
The project will construct new methods to understand data collected in forest ecology. Here the data are locations of different tree species across time, and we consider a particularly high-dimensional, rich source of data that consists of over 275,000 individual trees, belonging to 312 different species. These data exhibit patterns of spatial aggregation and segregation associated with different spatial scales, but these patterns also show anisotropy associated with explanatory variables, which may be broadly classed as having a biotic or abiotic influence. Biotic factors, such as competition for the same nutrients, typically act independently of direction, whereas abiotic, or environmental factors, can have a rotationally asymmetric influence on plant dispersal. Abiotic features, such as features of the landscape like rivers, elevation and soil type, are normally found at large spatial scales relative to that of direct interaction between individuals. This means that whilst competition may lead to segregation of individuals at small scales, they may occur in the same areas of the landscape, giving an appearance of aggregation at larger scales.
The project will thus determine how to best model strongly heterogeneous multiscale structure in forest ecology and develop the mathematics necessary to quantify their form, which is not possible with current methodology. More broadly, this project will provide a flexible set of tools, and a mathematical framework to understand highly heterogeneous and anisotropic classes of point processes.
Planned Impact
The work proposed in this project is fundamental research in statistics, and also directly impacts ecology. Statistics has an additional impact on society via collaborations and users of developed technologies. Statistical methodology underpins all study and interpretation of data, and point processes are ubiquitous in applications. Impact will follow via other statisticians who engage with other application areas.
The work in this project will be particularly relevant to research in ecology. This work is useful to solve problems associated with understanding changing ecosystems and so could benefit policy makers and government agencies responsible for environmentally sensitive ecosystems, such as Defra. In particular we plan to define new spatio-temporal summaries or indicators for multivariate data sets that quantify joint spatial structure in species abundances and potential changes to such. This is important in helping to form a notion of climate change impacts. There is good evidence that species rich communities are more productive, meaning more carbon is sequestered from the atmosphere, therefore understanding how numerous species are maintained, and whether they are vulnerable to environment is a key problem in ecology. We therefore need to understand the richness of species communities, and any potential evolution of such richness.
To ensure impact, we will take the following steps:
1) We will publish in both statistical and ecological journals, as well as on the open-access platforms, ArXiv and UCL Discovery. In addition, any code developed for the project will be formatted and published on our website.
2) Dissemination of our results will also be achieved through presentation at national and international conferences. The investigators will present results at both statistical and ecological meetings such as the RSS International Conference and meetings of the British Ecological Society (see Justification of Resources). At larger meetings, such as the Joint Statistical Meetings and the Ecological Society of America Annual Meeting, we will apply to host sessions on statistical ecology and methods for point process data. Attendance and presentation of our research at larger international meetings will ensure delivery of our methodology to a wide range of principal academic beneficiaries, as well as to end users in ecology, such as forest management bodies.
3) We will use the UCL Public Policy unit to forge new collaborations in the application of these methods, and the new UCL-Defra pilot partnership scheme.
4) The impact of our results will be further boosted by our engagement with the public communications office at UCL; through this office, we will be able to coordinate a high-level 'Lunch Hour Lecture' as well as attendance to the Bloomsbury Scientific Salon, and the University of the Third Age events to communicate our research to the public.
The work in this project will be particularly relevant to research in ecology. This work is useful to solve problems associated with understanding changing ecosystems and so could benefit policy makers and government agencies responsible for environmentally sensitive ecosystems, such as Defra. In particular we plan to define new spatio-temporal summaries or indicators for multivariate data sets that quantify joint spatial structure in species abundances and potential changes to such. This is important in helping to form a notion of climate change impacts. There is good evidence that species rich communities are more productive, meaning more carbon is sequestered from the atmosphere, therefore understanding how numerous species are maintained, and whether they are vulnerable to environment is a key problem in ecology. We therefore need to understand the richness of species communities, and any potential evolution of such richness.
To ensure impact, we will take the following steps:
1) We will publish in both statistical and ecological journals, as well as on the open-access platforms, ArXiv and UCL Discovery. In addition, any code developed for the project will be formatted and published on our website.
2) Dissemination of our results will also be achieved through presentation at national and international conferences. The investigators will present results at both statistical and ecological meetings such as the RSS International Conference and meetings of the British Ecological Society (see Justification of Resources). At larger meetings, such as the Joint Statistical Meetings and the Ecological Society of America Annual Meeting, we will apply to host sessions on statistical ecology and methods for point process data. Attendance and presentation of our research at larger international meetings will ensure delivery of our methodology to a wide range of principal academic beneficiaries, as well as to end users in ecology, such as forest management bodies.
3) We will use the UCL Public Policy unit to forge new collaborations in the application of these methods, and the new UCL-Defra pilot partnership scheme.
4) The impact of our results will be further boosted by our engagement with the public communications office at UCL; through this office, we will be able to coordinate a high-level 'Lunch Hour Lecture' as well as attendance to the Bloomsbury Scientific Salon, and the University of the Third Age events to communicate our research to the public.
Publications
Rajala T
(2019)
When do we have the power to detect biological interactions in spatial point patterns?
in The Journal of ecology
Rajala T
(2023)
What is the Fourier Transform of a Spatial Point Process?
in IEEE Transactions on Information Theory
Olhede SC
(2018)
The growing ubiquity of algorithms in society: implications, impacts and innovations.
in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
Maugis P
(2020)
Testing for Equivalence of Network Distribution Using Subgraph Counts
in Journal of Computational and Graphical Statistics
Tuomas Rajala
(2020)
Spectral estimation for spatial point patterns
Martin J
(2023)
Multivariate geometric anisotropic Cox processes
in Scandinavian Journal of Statistics
Martin J
(2018)
Multivariate Geometric Anisotropic Cox Processes
Lunagómez S
(2020)
Modeling Network Populations via Graph Distances
in Journal of the American Statistical Association
Rajala T
(2018)
Detecting Multivariate Interactions in Spatial Point Patterns with Gibbs Models and Variable Selection
in Journal of the Royal Statistical Society Series C: Applied Statistics
Description | We have developed new understanding relating to the significance of discovered interactions, and developed a high dimensional version of estimating point process interactions. We have developed new techniques that can determine the significance of interactions, and found that previously found non-interactions may be less significant. |
Exploitation Route | This will have an impact for monitoring ecological diversity as it permits you to study a much larger population of species than was possible before. |
Sectors | Environment |
URL | https://arxiv.org/abs/2009.01474 |
Description | The findings of this grant has established that many previously discovered interactions might have been found to be non-significant by chance. This will have implications for species preservation and understanding of the rich ecosystem in the rain forest. |
First Year Of Impact | 2018 |
Sector | Environment |
Description | Creative reactions project |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | This event brought science together with art to further increase the public's understanding of science. I worked with a poet as well as visual artists. |
Year(s) Of Engagement Activity | 2017 |
Description | UCL Natural Science Club |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Public/other audiences |
Results and Impact | This described general ways of analysing patterns in nature. |
Year(s) Of Engagement Activity | 2017 |
Description | UCL Science Centre for Schools- mathematical patterns in nature |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Schools |
Results and Impact | This is a UCL institution-it reaches 100s of A-level students who want to learn about science. |
Year(s) Of Engagement Activity | 2017 |
URL | http://www.ucl.ac.uk/physics-astronomy/outreach/science-lectures |
Description | spoke in data session of Royal Society meeting harnessing the potential of AI in the north west |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Industry/Business |
Results and Impact | This meeting brought together practitioners of data science in the northwest of England with current research level activities. |
Year(s) Of Engagement Activity | 2017 |
Description | talk at Office of national statistics |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Policymakers/politicians |
Results and Impact | I gave a talk to the office of national statistics on data ethics. |
Year(s) Of Engagement Activity | 2017 |