Flexible estimation in temporal point processes and graphs

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Multivariate point processes with history dependence such as Hawkes processes naturally admit a graph representation. Nodes are the dimensions of the process and edges appear between dimensions having a non-null interaction function. In the non-negative case - accounting for excitation phenomena - nonparametric estimation of those functions have been studied in the Bayesian framework in Donnet et al. [1]. In this context, estimation of the graph and the non-positive case - accounting for inhibition phenomena - are open topics. In fact, these problems encounter a lot of interest in the neuroscience context. Neurons interact through action potentials which are generally treated as identical events (spike trains). Multivariate Hawkes Processes can thus model functional connectivity between neurons and identify excitation/inhibition relationships.

To analyse the graph obtained from Multivariate Hawkes processes, weighted edges can be obtained by associating to each of them a norm of the interaction functions. In the case of non-linear Hawkes processes, those functions are allowed to be non-positive and lead to negatively weighted edges. The resulting graph is said to be signed, and requires special methods of graph analysis. In particular, clustering is a popular task which aims at identifying communities of nodes having similar features within a network. For the signed case, spectral algorithms have been adapted but are still considered suboptimal. Cucuringu et al. [2] developed a new method based on the combination of regularized graph Laplacian matrices. However, theoretical results on random graph models have only been obtained in the case of two disjoint communities. Extension to a larger number of communities and the context of sparse networks are still unanswered questions.

[1] Sophie Donnet, Vincent Rivoirard, and Judith Rousseau. Nonparametric Bayesian Estimation of Multivariate Hawkes Processes. arXiv:1802.05975v2, 2018.

[2] Mihai Cucuringu, Peter Davies and Hemant Tyagi. SPONGE: A generalized eigenproblem for clustering signed networks. AISTATS, 2018.

Planned Impact

Our primary impact will be over 50 trained graduates. The Oxford-Warwick Centre will provide future industrial and academic research leaders in statistics for modern day science, engineering, and commerce all exemplified by "big data". The strategic vision is to train the next generation of statisticians who will enable the new data-intensive sciences and industries. Products which use sophisticated statistical ideas to add value to data are being taken up by the public and there is widespread opportunity for wealth creation.

Our partners give some idea of the sectors we impact: Xerox, Amazon and Google produce products with 'Statistics inside' for data analysis on massive data sets. These companies are producing data-mining tools. These tools are applied by the companies themselves and the public to analysis data about society: image data, payment transactions, twitter feeds: all massive streaming data sets. Deepmind and Optimor have similar interests. Ilumina, Unilever, Novartis and GSK produce pharmaceuticals and biotech products, whilst Lubrixol is a lubricant manufacturer. All carry out substantial statistical work to develop products, and in some cases sell statistical tools as part of the product. Man Investments and Milward Brown are respectively Investment and Marketing companies. They use statistical tools to quantify risk, and look for predtictable structures in data masked by the noise of human decision making.

By making impact, through improved evidenced based statistical science, at companies such as Amazon, Google and GlaxoSmithKline, the CDT will ultimately impact benefits to the public users of their services.

Statistics has been called 'the science of doing science'. Statisticians support research across science. We aim to produce graduates with the ability to transfer statistical methods across discipline boundaries, with the skills to analyze large and complex data sets wherever they arise.

Publications

10 25 50