High Dimensional Models for Multivariate Time Series Analysis

Lead Research Organisation: University College London
Department Name: Statistical Science

Abstract

This fellowship will focus on developing methods for high dimensional time series analysis. Methodology for high dimensional data is one of the most important current research topics in statistics and signal processing, where massive data sets have inspired the development of a new statistical paradigm based on sparsity. Such developments have mainly concerned deterministic structure immersed in noise, while this program will model the signal of interest as stochastic. The advantage of modeling an observed signal stochastically as a time series is that one can deduce properties of a population of series, important for the correct understanding of uncertainty or variability in structure.Traditional time series methods are restricted to stationary processes, whose structure is homogeneous in time. The project will instead develop theory and methodology for classes of nonstationary processes, that can experience changes in their generating mechanism over the time course of observation. Such processes are important as they allow us to model the evolution of an observable quantity, and also enable us to quantify this evolution explicitly. Nonstationary processes are observed in a number of applications such as geoscience (remote sensing and satellite observations), oceanography (drifter and float measurements), neuroscience (functional MRI and EEG) and ecology (species abundance) to mention but a few areas. In such applications single processes are rarely of interest, and so we shall develop methods for the analysis of multiple (or equivalently multivariate) signals, to quantify the evolving interdependencies of observed processes.The difficulty in analyzing nonstationary signals is their high degree of overparameterization, that is much exacerbated if inferences are to be made of multiple series. At first glance reliable estimation in such problems seems impossible, as a consequence of the extreme overparameterization. Assumptions on sparsity have recently been used to enable estimation in related overparameterized problems. Such methods need careful extension and substantial innovation to cover the case of multivariate and stochastic signals, that we propose to address via this project. Key to developing such methods is introducing new sparse classes of nonstationary processes, building on recent developments in statistics for high dimensional data. Sparse models despite a nominal degree of high complexity are described by some unknown but simpler structure of smaller complexity. Sparse models will be constructed to contain previously incompatible nonstationary processes, thus enabling us to treat series that lacked a natural analysis framework.This proposal therefore aims to a) introduce new classes of nonstationary processes for single signals using sparsity, b) extend these classes to rich families of multivariate processes for scenarios where either the group structure of the processes is known or has to be learned, c) develop a theoretical understanding of the estimability of such classes of processes and d) develop general estimation methods as well as application specific methodology.We expect this work to impact statistics much beyond time series. New forms of sparsity and methods will also be relevant to related problems in mathematics, machine learning and signal processing, especially in terms of defining new forms of signal group sparsity. The work will also have more than a methodological impact as the development of these methods will allow us to analyze multiple series that previously could not be analyzed, and we intend to develop application specific methods with our collaborators.

Planned Impact

The work proposed in this project is fundamental research in statistics. Statistics has an impact on society via collaborations and users of developed technologies. Statistical methodology underpins all study and interpretation of data: such as scientific observations of Earth, space, and the planets; laboratory and engineering tests; surveys and experiments in fields as diverse as politics, psychology or sociology. In short, statistics is how we make sense of information, and the world around us. The range of application in this project is considerable; this is due to the unifying nature and general applicability of the mathematical sciences. The same mathematical structures are all part of very differing application projects, and this increases the potential for impact. A number of the proposed collaborations in this project will ensure that the methodological developments of the project will connect to end-users and that the results will be of direct practical utility and of importance to society. In particular I note the following beneficiaries: Prematurely born infants. Hospitalized neonates are frequently exposed to noxious, stressful and tissue damaging procedures, as part of their treatment. How these procedures are perceived and their long term consequences on the nervous system are poorly understood. Present methods of pain assessment are based on behavioral and physiological body reactions, but are limited in utility and precision. I am collaborating with Prof Fitzgerald's group in devising an objective method to examine central pain processing in infants by directly measuring brain activity. This is based on the analysis of multiple EEG time series recordings, collected at the neonatal unit of UCL Hospital and our work has the potential to greatly facilitate a correct pain management strategy for the neonates. The end beneficiaries are in this instance the neonates and their families. Epilepsy sufferers. The fMRI work envisioned jointly with Hubert Fonteijn is to develop new methods of understanding dynamic connectivity in resting state fMRI. Resting state fMRI has become a popular tool to study neurological diseases and the development of nonstationary analysis methods will significantly contribute to understanding already observed effects in such observations. Researchers are currently developing detection of for example epileptic activity using EEG-fMRI data. However, the acquisition of this data remains challenging. Nonstationary time series analyses of only fMRI data would allow us to bypass the simultaneous acquisition of EEG-fMRI data to detect regions of epileptic activity. The detection of the source of this activity is of crucial importance because in 30 % of the epilepsy patients, surgical removal of this area is the only serious treatment option. The routes of impact is via my collaborators to eventually impact clinical practice. General Society. My work has relevance to problems associated with understanding changing ecosystems and so could benefit policy makers and government agencies responsible for environmentally sensitive ecosystems, such as DEFRA. In particular we plan to define new spatio-temporal summaries or indicators for multivariate data sets that quantify joint spatial structure in species abundances and potential changes to such. This is important in helping to form a notion of climate change impacts. There is good evidence that species rich communities are more productive, meaning more carbon is sequestered from the atmosphere, therefore understanding how numerous species are maintained, and whether they are vulnerable to environment is a key problem in ecology. We therefore need to understand the richness of species communities, and any potential evolution of such richness. A number of my other projects has the potential of societal impact. This underscores the role of the mathematical sciences; developments can impact across a range of societal questions.

Publications

10 25 50
 
Description This project has developed new methods of understanding multiple observations in time, especially those which perpetuate the same oscillation. Sometimes multiple oscillations can be hard to relate, as the different processes can be in different parts of the common cycle. This lack of synchrony can cause us to neglect common patterns. Furthermore, methods to approximate the likelihood in a computationally efficient method were developed, and extended into physical oceanography.
Exploitation Route The publications are being cited in both statistics and signal processing.
Sectors Agriculture, Food and Drink,Energy,Environment,Healthcare

 
Description The findings of this work has been used in oceanography. Kathleen Dohan (NASA/ESR) is inputting revised damping estimates in OSCAR. Matlab has a revised Continuous Wavelet Transform based on our work GDP programme developments via Rick Lumpkin & Renellys Perez (AOML, NOAA), and Pascal Lelong (UW). Various public communication activities at the Natural History Museum, UCL Lunch Hour Lecture, Winton Capital etc.
First Year Of Impact 2018
Sector Environment,Healthcare
Impact Types Policy & public services

 
Title Model for analysis of network data 
Description This permits us to understand large collections of interactions. 
Type Of Material Data analysis technique 
Provided To Others? No  
Impact Work was published in PNAS 
 
Description Oceanographic Data Analysis 
Organisation National Oceanography Centre
Country United Kingdom 
Sector Academic/University 
PI Contribution We have developed new methods to analyse oceanographic time series.
Collaborator Contribution Analysis of obtained results.
Impact Multi-disciplinary collaboration resulting in publications.
Start Year 2009
 
Description Sir Alister Hardy Foundation for Ocean Science 
Organisation Sir Alister Hardy Foundation for Ocean Science
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We developed new methods for the study of the change in plankton abundance in the North Sea: this lead to multiple publications.
Collaborator Contribution He gave advice on the interpretation of the achieved result.
Impact The collaboration was multidisciplinary between ecology and statistics
Start Year 2007
 
Description AI & the LAW 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This event was a set of presentations and Q & A's at the Inner Temple. It counts as 1 h CPD.
Year(s) Of Engagement Activity 2017
 
Description Bloomsbury scientific salon 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Talk generated interest in statistical analysis

I had a number of PhD supervision requests.
Year(s) Of Engagement Activity 2013
 
Description Creative reactions project 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact This event brought science together with art to further increase the public's understanding of science. I worked with a poet as well as visual artists.
Year(s) Of Engagement Activity 2017
 
Description Discussion panelist at the Royal Society summer exhibition "social media, filter bubbles and bias" 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact This was a general debate on social media, filter bubbles and bias. Given the general public interest in this area the meeting was incredibly lively.
Year(s) Of Engagement Activity 2017
 
Description Jersey meeting on Cognitive AI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact This meeting brought cutting edge data science and data ethics in contact with Jersey businesses.
Year(s) Of Engagement Activity 2017
 
Description Klein days at the Mittag Lefler institute 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact This meeting brings A-level maths teachers in Sweden in direct contact with cutting-edge mathematics. The reach is far because each of 30 teacher goes back to teach in an A-level school.
Year(s) Of Engagement Activity 2017
 
Description Presentation on data ethics at http://www.dpforum.org.uk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact I gave a talk at the Industrial Data Protection Forum with attendance from http://www.dpforum.org.uk
Year(s) Of Engagement Activity 2017
 
Description SET for Britain parliament visit 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? Yes
Geographic Reach Local
Primary Audience Policymakers/politicians
Results and Impact A poster was presented at a Parliamentary event.

My post-doc was asked questions
Year(s) Of Engagement Activity 2013
 
Description Talk to Royal Institution of Great Britain 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Gave a talk about the Alan Turing Institute and data science.
Year(s) Of Engagement Activity 2015
 
Description UCL Public Lunch Hour Lecture with Podcast 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Talk sparked questions regarding data analysis and usage of statistics in public sector

interest by public in statistics
Year(s) Of Engagement Activity 2013
 
Description UCL Science Centre for Schools- mathematical patterns in nature 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact This is a UCL institution-it reaches 100s of A-level students who want to learn about science.
Year(s) Of Engagement Activity 2017
URL http://www.ucl.ac.uk/physics-astronomy/outreach/science-lectures
 
Description spoke in data session of Royal Society meeting harnessing the potential of AI in the north west 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact This meeting brought together practitioners of data science in the northwest of England with current research level activities.
Year(s) Of Engagement Activity 2017
 
Description talk at Office of national statistics 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact I gave a talk to the office of national statistics on data ethics.
Year(s) Of Engagement Activity 2017