StatScale: Statistical Scalability for Streaming Data

Lead Research Organisation: Lancaster University

Department Name: Mathematics and Statistics

Abstract

We live in the age of data. Technology is transforming our ability to collect and store data on unprecedented scales. From the use of Oyster card data to improve London's transport network, to the Square Kilometre Array astrophysics project that has the potential to transform our understanding of the universe, Big Data can inform and enrich many aspects of our lives. Due to the widespread use of sensor-based systems in everyday life, with even smartphones having sensors that can monitor location and activity level, much of the explosion of data is in the form of data streams: data from one or more related sources that arrive over time. It has even been estimates that there will be over 30 billion devices collecting data streams by 2020.

The important role of Statistics within "Big Data" and data streams has been clear for some time. However the current tendency has been to focus purely on algorithmic scalability, such as how to develop versions of existing statistical algorithms that scale better with the amount of data. Such an approach, however, ignores the fact that fundamentally new issues often arise when dealing with data sets of this magnitude, and highly innovative solutions are required.

Model error is one such issue. Many statistical approaches are based on the use of mathematical models for data. These models are only approximations of the real data-generating mechanisms. In traditional applications, this model error is usually small compared with the inherent sampling variability of the data, and can be overlooked. However, there is an increasing realisation that model error can dominate in Big Data applications. Understanding the impact of model error, and developing robust methods that have excellent statistical properties even in the presence of model error, are major challenges.

A second issue is that many current statistical approaches are not computationally feasible for Big Data. In practice we will often need to use less efficient statistical methods that are computationally faster, or require less computer memory. This introduces a statistical-computational trade-off that is unique to Big Data, leading to many open theoretical questions, and important practical problems.

The strategic vision for this programme grant is to investigate and develop an integrated approach to tackling these and other fundamental statistical challenges. In order to do this we will focus in particular on analysing data streams. An important issue with this type of data is detecting changes in the structure of the data over time. This will be an early area of focus for the programme, as it has been identified as one of seven key problem areas for Big Data. Moreover it is an area in which our research will lead to practically important breakthroughs. Our philosophy is to tackle methodological, theoretical and computational aspects of these statistical problems together, an approach that is only possible through the programme grant scheme. Such a broad perspective is essential to achieve the substantive fundamental advances in statistics envisaged, and to ensure our new methods are sufficiently robust and efficient to be widely adopted by academics, industry and society more generally.

Planned Impact

Who will benefit?
This proposal will benefit a variety of different stakeholders including:
(a) A wide range of industries, including collaborating industrial partners and those organisations that handle large volumes of data [e.g. the NHS, Transport Agencies, Energy companies etc.];
(b) Society more generally through the application of this research;
(c) The academic research community, particularly in disciplines that underpin and relate to the data sciences;
(d) Project personnel: PDRAs and PhD students.

How will they benefit?
New techniques: (a, b, c)
The research undertaken will develop a number of exciting new statistical techniques that will be disseminated to our partners and user communities. Our methods will result in more efficient and cost-effective ways of marshalling precious resources, by making principled analysis of very large datasets either i) feasible and/or ii) faster and more accurate. These benefits will flow through the economy and society via a number of different mechanisms. These might include more efficient use of resources (e.g. better management of oil fields via improved processing of well operation data); improved productivity (e.g. via more timely management and intervention of faults on telecommunications networks) and society more generally (via the development of statistical methods capable of analysing eHealth data streams, e.g. support monitoring of vulnerable elderly living independently). To enable this, we specifically include resource for a Research Software Engineer to make available high-quality, documented open source code for others to use.

Targeted Knowledge Exchange: (a)
Significant further benefit will accrue to beneficiary group (a) through their partnership on this project. At this stage, we benefit from the support of several leading organisations in the energy, health and telecommunications sectors. They have expressed enthusiastic support for this programme's vision and have provided valuable insight and advice as we have developed this proposal.
For example, through dialogue with this community the idea of short-term secondment visits via a partnership programme has developed. PDRAs will spend periods of time at partner locations developing case studies that demonstrate the utility of developed methods on data rich products and systems. They are also keen to work with us to develop successful knowledge exchange mechanisms. Representatives from this community will also sit on the Advisory and Impact Board.

Generic Knowledge Exchange: (c)
We will develop methods that are of considerable interest to the academic community both in Statistics and other fields. As well as the traditional routes of journal publication, workshops and conferences the programme will develop open source R software that embodies our techniques: these will benefit the academic community and beyond. Further, we will work with our advisory group to share our techniques to a wider audience where appropriate, through an academic partnership programme that will facilitate research retreats, academic exchanges etc.

Developing good people: (b,d)
The programme will develop highly skilled researchers in a statistical field of high strategic importance. Project personnel will benefit from a supportive training, research and development environment, given the opportunity to create new techniques and see them employed in a productive and worthwhile setting. They will therefore be ideally positioned to seek future employment in a field/industry that enables them to make a strong contribution to society.

Contributing to the future supply of people: (all)
This proposal will secure an increase in the number and quality of researchers in statistics in an area of historic shortage. In particular, with the advent of sensor-based industrial systems, the need for developing future research leaders capable of underpinning the UK's competitive advantage in this area is crucial

Funded Value:

£2,750,889

Funded Period:

May 16 - May 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/N031938/1

Principal Investigator:

Idris Eckley

Research Subject:

Mathematical sciences (100%)

Research Topic:

Statistics & Appl. Probability (100%)

Organisations

People	ORCID iD
Idris Eckley (Principal Investigator)
Rajen Shah (Co-Investigator)
Richard Samworth (Co-Investigator)
John Aston (Co-Investigator)
Paul Fearnhead (Co-Investigator)	http://orcid.org/0000-0002-9386-2341

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 7 > >|

10 25 50

Agarwal G (2023) Semiparametric detection of changepoints in location, scale, and copula in Statistical Analysis and Data Mining: The ASA Data Science Journal

Aston J (2018) High dimensional efficiency with applications to change point tests in Electronic Journal of Statistics

Austin E (2023) Online non-parametric changepoint detection with application to monitoring operational performance of network devices in Computational Statistics & Data Analysis

Banerjee M (2018) A Conversation with Jon Wellner

Barber R (2021) Local continuity of log-concave projection, with applications to estimation under model misspecification

Barber R (2021) Local continuity of log-concave projection, with applications to estimation under model misspecification in Bernoulli

Barber, R. F. (2021) Local continuity of log-concave projection, with applications to estimation under model misspecification in Bernoulli

Bardwell L (2018) Most Recent Changepoint Detection in Panel Data in Technometrics

Berrett T (2019) Nonparametric independence testing via mutual information in Biometrika

Berrett T (2023) EFFICIENT FUNCTIONAL ESTIMATION AND THE SUPER-ORACLE PHENOMENON

Berrett T (2020) The conditional permutation test for independence while controlling for confounders

Berrett T (2023) Optimal nonparametric testing of Missing Completely At Random and its connections to compatibility

Berrett T (2020) Optimal rates for independence testing via $U$-statistic permutation tests

Berrett T (2019) Nonparametric independence testing via mutual information

Berrett T (2022) Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility

Berrett T (2021) Optimal rates for independence testing via U-statistic permutation tests in The Annals of Statistics

Berrett T (2021) USP: an independence test that improves on Pearson's chi-squared and the G-test.

Berrett TB (2021) USP: an independence test that improves on Pearson's chi-squared and the G-test. in Proceedings. Mathematical, physical, and engineering sciences

Berrett, T. B. (2021) Optimal rates for independence testing via U-statistic permutation tests in Annals of Statistics

Biggs F (2023) MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting

Cannings T (2020) LOCAL NEAREST NEIGHBOUR CLASSIFICATION WITH APPLICATIONS TO SEMI-SUPERVISED LEARNING

Chen W (2023) A new computational framework for log-concave density estimation

Chen Y (2023) Inference in High-Dimensional Online Changepoint Detection

Chen Y (2022) High-dimensional, multiscale online changepoint detection

Chen Y (2020) High-dimensional, multiscale online changepoint detection

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Software and Technical Products


Description	The StatScale Programme was conceived to help catalyse research in the broad area of scalable statistical methods for streaming data. To achieve this, StatScale's team brought together diverse research strengths across the range of statistical research activity to develop the next generation tools required to realise this ambitious vision. As a consequence, StatScale has made major contributions in three main areas: · Changepoint methods · Conditional Independence Testing · Model Misspecification At StatScale's outset each of these areas represented a new challenge where the development of novel methods and tools was meaningful and useful. Major progress has been made in all three - providing a suite of important publications, describing new statistical tools and their implementation in software form for researchers and practitioners alike. Moreover, both the research and associated community building events supported by the programme have helped to stimulate activity within in each of these areas internationally. This is particularly clear in the area of changepoint methods, a statistical topic that is of growing importance to a number of other research and application areas. Finally, the StatScale programme also developed a number of postdocs who have gone onto academic positions at a range of leading universities.
Exploitation Route	The research from this programme may be taken forward in a number of ways. For example, within the Statistical research community, we hope that StatScale's legacy will be to have catalysed and sustained activity in each of the three main areas for several years to come. Our understanding is that the methods, and associated software, developed are also being explored in a number of other disciplines: from computer science and digital networking, to astrophysics. As the tools and methods become increasingly shared, we envisage the breadth of areas benefiting from this research growing.
Sectors	Aerospace Defence and Marine Agriculture Food and Drink Chemicals Construction Digital/Communication/Information Technologies (including Software) Electronics Energy Environment Financial Services and Management Consultancy Healthcare Government Democracy and Justice Manufacturing including Industrial Biotechology Culture Heritage Museums and Collections Pharmaceuticals and Medical Biotechnology Retail Security and Diplomacy


Description	To date, the methods developed have predominantly attracted interest from our industrial partners. At the time of writing, not all results have been published so the full extent of the impact of this grant will not be known for some time. However we note two noteworthy examples of impact arising from the programme thus far: The first relates to the anomaly detection work reported by Fisch et. al (2022), which is already being used by BT to provide data-driven insights that help operate and maintain the UK's digital infrastructure. Other methods, such as those reported by Jewell et al. (2020), have been used by the Allen Institute for Brain Science as they develop understanding of how the human brain works.
First Year Of Impact	2019
Sector	Aerospace, Defence and Marine,Agriculture, Food and Drink,Construction,Digital/Communication/Information Technologies (including Software),Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Transport
Impact Types	Societal Economic


Description	Isaac Newton Programme on Statistical Scalability
Amount	£180,000 (GBP)
Funding ID	Statistical Scalability
Organisation	Isaac Newton Institute for Mathematical Sciences
Sector	Academic/University
Country	United Kingdom
Start	01/2018
End	06/2018


Description	Methodologically Enhanced Virtual Labs for Early Warning of Significant or Catastrophic Change in Ecosystems: Changepoints for a Changing Planet
Amount	£203,419 (GBP)
Funding ID	NE/T006102/1
Organisation	Natural Environment Research Council
Sector	Public
Country	United Kingdom
Start	11/2019
End	11/2021


Description	Next Generation Converged Digital infrastructure (NG-CDI)
Amount	£5,000,000 (GBP)
Funding ID	EP/R004935/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	11/2017
End	10/2022


Title	Detecting Changes in Slope With an L₀ Penalty
Description	While there are many approaches to detecting changes in mean for a univariate time series, the problem of detecting multiple changes in slope has comparatively been ignored. Part of the reason for this is that detecting changes in slope is much more challenging: simple binary segmentation procedures do not work for this problem, while existing dynamic programming methods that work for the change in mean problem cannot be used for detecting changes in slope. We present a novel dynamic programming approach, CPOP, for finding the "best" continuous piecewise linear fit to data under a criterion that measures fit to data using the residual sum of squares, but penalizes complexity based on an L0 penalty on changes in slope. We prove that detecting changes in this manner can lead to consistent estimation of the number of changepoints, and show empirically that using an L0 penalty is more reliable at estimating changepoint locations than using an L1 penalty. Empirically CPOP has good computational properties, and can analyze a time series with 10,000 observations and 100 changes in a few minutes. Our method is used to analyze data on the motion of bacteria, and provides better and more parsimonious fits than two competing approaches. Supplementary material for this article is available online.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
URL	https://tandf.figshare.com/articles/Detecting_changes_in_slope_with_an_i_L_i_sub_0_sub_penalty/69870...


Title	Detecting Changes in Slope With an L₀ Penalty
Description	While there are many approaches to detecting changes in mean for a univariate time series, the problem of detecting multiple changes in slope has comparatively been ignored. Part of the reason for this is that detecting changes in slope is much more challenging: simple binary segmentation procedures do not work for this problem, while existing dynamic programming methods that work for the change in mean problem cannot be used for detecting changes in slope. We present a novel dynamic programming approach, CPOP, for finding the "best" continuous piecewise linear fit to data under a criterion that measures fit to data using the residual sum of squares, but penalizes complexity based on an L0 penalty on changes in slope. We prove that detecting changes in this manner can lead to consistent estimation of the number of changepoints, and show empirically that using an L0 penalty is more reliable at estimating changepoint locations than using an L1 penalty. Empirically CPOP has good computational properties, and can analyze a time series with 10,000 observations and 100 changes in a few minutes. Our method is used to analyze data on the motion of bacteria, and provides better and more parsimonious fits than two competing approaches. Supplementary material for this article is available online.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
URL	https://tandf.figshare.com/articles/Detecting_changes_in_slope_with_an_i_L_i_sub_0_sub_penalty/69870...


Title	Detecting changes in slope with an L₀ penalty
Description	Whilst there are many approaches to detecting changes in mean for a univariate time-series, the problem of detecting multiple changes in slope has comparatively been ignored. Part of the reason for this is that detecting changes in slope is much more challenging: simple binary segmentation procedures do not work for this problem, whilst existing dynamic programming methods that work for the change in mean problem cannot be used for detecting changes in slope. We present a novel dynamic programming approach, CPOP, for finding the "best" continuous piecewise-linear fit to data under a criterion that measures fit to data using the residual sum of squares, but penalises complexity based on an L0 penalty on changes in slope. We prove that detecting changes in this manner can lead to consistent estimation of the number of changepoints, and show empirically that using an L0 penalty is more reliable at estimating changepoint locations than using an L1 penalty. Empirically CPOP has good computational properties, and can analyse a time-series with 10,000 observations and 100 changes in a few minutes. Our method is used to analyse data on the motion of bacteria, and provides better and more parsimonious fits than two competing approaches.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
URL	https://tandf.figshare.com/articles/Detecting_changes_in_slope_with_an_i_L_i_sub_0_sub_penalty/69870...


Title	Inference in High-Dimensional Online Changepoint Detection
Description	We introduce and study two new inferential challenges associated with the sequential detection of change in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices of coordinates in which the mean changes. We propose an online algorithm that produces an interval with guaranteed nominal coverage, and whose length is, with high probability, of the same order as the average detection delay, up to a logarithmic factor. The corresponding support estimate enjoys control of both false negatives and false positives. Simulations confirm the effectiveness of our methodology, and we also illustrate its applicability on the U.S. excess deaths data from 2017 to 2020. The supplementary material, which contains the proofs of our theoretical results, is available online.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
URL	https://tandf.figshare.com/articles/dataset/Inference_in_High-dimensional_Online_Changepoint_Detecti...


Title	Inference in High-Dimensional Online Changepoint Detection
Description	We introduce and study two new inferential challenges associated with the sequential detection of change in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices of coordinates in which the mean changes. We propose an online algorithm that produces an interval with guaranteed nominal coverage, and whose length is, with high probability, of the same order as the average detection delay, up to a logarithmic factor. The corresponding support estimate enjoys control of both false negatives and false positives. Simulations confirm the effectiveness of our methodology, and we also illustrate its applicability on the U.S. excess deaths data from 2017 to 2020. The supplementary material, which contains the proofs of our theoretical results, is available online.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
URL	https://tandf.figshare.com/articles/dataset/Inference_in_High-dimensional_Online_Changepoint_Detecti...


Title	Inference in High-dimensional Online Changepoint Detection
Description	We introduce and study two new inferential challenges associated with the sequential detection of change in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices of coordinates in which the mean changes. We propose an online algorithm that produces an interval with guaranteed nominal coverage, and whose length is, with high probability, of the same order as the average detection delay, up to a logarithmic factor. The corresponding support estimate enjoys control of both false negatives and false positives. Simulations confirm the effectiveness of our methodology, and we also illustrate its applicability on the US excess deaths data from 2017-2020. The supplementary material, which contains the proofs of our theoretical results, is available online.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
URL	https://tandf.figshare.com/articles/dataset/Inference_in_High-dimensional_Online_Changepoint_Detecti...


Title	Subset Multivariate Collective and Point Anomaly Detection
Description	In the recent years, there has been a growing interest in identifying anomalous structure within multivariate data sequences. We consider the problem of detecting collective anomalies, corresponding to intervals where one, or more, of the data sequences behaves anomalously. We first develop a test for a single collective anomaly that has power to simultaneously detect anomalies that are either rare, that is affecting few data sequences, or common. We then show how to detect multiple anomalies in a way that is computationally efficient but avoids the approximations inherent in binary segmentation-like approaches. This approach is shown to consistently estimate the number and location of the collective anomalies-a property that has not previously been shown for competing methods. Our approach can be made robust to point anomalies and can allow for the anomalies to be imperfectly aligned. We show the practical usefulness of allowing for imperfect alignments through a resulting increase in power to detect regions of copy number variation. Supplemental files for this article are available online.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
URL	https://tandf.figshare.com/articles/dataset/Subset_Multivariate_Collective_and_Point_Anomaly_Detecti...


Title	Subset Multivariate Collective and Point Anomaly Detection
Description	In the recent years, there has been a growing interest in identifying anomalous structure within multivariate data sequences. We consider the problem of detecting collective anomalies, corresponding to intervals where one, or more, of the data sequences behaves anomalously. We first develop a test for a single collective anomaly that has power to simultaneously detect anomalies that are either rare, that is affecting few data sequences, or common. We then show how to detect multiple anomalies in a way that is computationally efficient but avoids the approximations inherent in binary segmentation-like approaches. This approach is shown to consistently estimate the number and location of the collective anomalies-a property that has not previously been shown for competing methods. Our approach can be made robust to point anomalies and can allow for the anomalies to be imperfectly aligned. We show the practical usefulness of allowing for imperfect alignments through a resulting increase in power to detect regions of copy number variation. Supplemental files for this article are available online.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
URL	https://tandf.figshare.com/articles/dataset/Subset_Multivariate_Collective_and_Point_Anomaly_Detecti...


Title	Anomaly
Description	An implementation of CAPA (Collective And Point Anomaly) for the detection of anomalies in time series data.
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	An implementation of CAPA (Collective And Point Anomaly) for the detection of anomalies in time series data.
URL	https://cran.r-project.org/web/packages/anomaly/index.html


Title	BayesProject: Fast Projection Direction for Multivariate Changepoint Detection
Description	Implementations in 'cpp' of the BayesProject algorithm (see G. Hahn, P. Fearnhead, I.A. Eckley (2020) ) which implements a fast approach to compute a projection direction for multivariate changepoint detection, as well as the sum-cusum and max-cusum methods, and a wild binary segmentation wrapper for all algorithms.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	This is open source software, and we are unaware of any notable impacts.
URL	https://doi.org/10.1007%2Fs11222-020-09966-2


Title	CatReg: Solution Paths for Linear and Logistic Regression Models with SCOPE Penalty
Description	Computes solutions for regularised linear and logistic regression models with high-dimensional categorical covariates.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	Too early to say
URL	https://CRAN.R-project.org/package=CatReg


Title	ChangepointInference
Description	Software to implement post-selection inference method for change points from Jewell, S., Fearnhead, P., & Witten, D. (Accepted/In press). Testing for a Change in Mean After Changepoint Detection. Journal of the Royal Statistical Society: Series B (Statistical Methodology)
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	None
URL	https://arxiv.org/abs/1910.04291


Title	DeCAFS: Detecting Changes in Autocorrelated and Fluctuating Signals
Description	Detect abrupt changes in time series with local fluctuations as a random walk process and autocorrelated noise as an AR(1) process. See Romano, G., Rigaill, G., Runge, V., Fearnhead, P. (2020)
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	This is open-source software, we are currently unaware of any notable impacts.
URL	https://arxiv.org/abs/2005.01379


Title	Functional Online CUSUM
Description	Implement the Functional Online CUSUM method of Fast Online Changepoint Detection via Functional Pruning CUSUM statistics Gaetano Romano, Idris Eckley, Paul Fearnhead, Guillem Rigaill arXiv.2110.08205
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	None. Though interest in the method has been shown by British Telecom
URL	https://arxiv.org/abs/2110.08205


Title	GRPtests
Description	Methodology for testing nonlinearity in the conditional mean function in low- or high-dimensional generalized linear models, and the significance of (potentially large) groups of predictors. Details on the algorithms can be found in the paper by Jankova, Shah, Buehlmann and Samworth (2019) .
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	Too early to say
URL	https://CRAN.R-project.org/package=GRPtests


Title	GeneralisedCovarianceMeasure: Test for Conditional Independence Based on the Generalized Covariance Measure (GCM)
Description	A statistical hypothesis test for conditional independence. It performs nonlinear regressions on the conditioning variable and then tests for a vanishing covariance between the resulting residuals. It can be applied to both univariate random variables and multivariate random vectors. Details of the method can be found in Rajen D. Shah and Jonas Peters (2018) .
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	Used by A.P. Moller Maersk in testing whether structural causal models relating to pricing can be falsified.
URL	https://CRAN.R-project.org/package=GeneralisedCovarianceMeasure


Title	IndepTest
Description	R package for independence testing
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	None as yet.
URL	https://cran.r-project.org/web/packages/IndepTest/index.html


Title	InspectChangepoint
Description	R package for high-dimensional changepoint estimation.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	None as yet.
URL	https://cran.r-project.org/web/packages/InspectChangepoint/index.html


Title	LogConcComp
Description	Github python code for computing the log-concave maximum likelihood estimator
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	None as yet.
URL	https://github.com/wenyuC94/LogConcComp


Title	MCARtest: Optimal Nonparametric Testing of Missing Completely at Random
Description	R package
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	None as yet.
URL	https://cran.r-project.org/web/packages/MCARtest/index.html


Title	MissInspect
Description	Github R functions for changepoint estimation with heterogeneous missingness
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	None as yet.
URL	https://github.com/wangtengyao/MissInspect


Title	R package:CROPS
Description	Implementation of the CROPS wrapper for changepoint methods. The CROPS algorithm is described in Haynes, Kaylea, Idris A. Eckley, and Paul Fearnhead. "Computationally efficient changepoint detection for a range of penalties." Journal of Computational and Graphical Statistics 26.1 (2017): 134-143.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	None
URL	https://cran.r-project.org/web/packages/crops/index.html


Title	RobKF: Innovative and/or Additive Outlier Robust Kalman Filtering
Description	Implements a series of robust Kalman filtering approaches. It implements the additive outlier robust filters of Ruckdeschel et al. (2014) and Agamennoni et al. (2018) , the innovative outlier robust filter of Ruckdeschel et al. (2014) , as well as the innovative and additive outlier robust filter of Fisch et al. (2020)
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	This is open source software, and we are unaware of any notable impacts.
URL	https://arxiv.org/abs/2007.03238


Title	SPCAvRP
Description	R package for sparse PCA
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	None as yet.
URL	https://cran.r-project.org/web/packages/SPCAvRP/index.html


Title	Sshaped
Description	R package for fitting S-shaped functions
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	None as yet.
URL	https://cran.r-project.org/web/packages/Sshaped/index.html


Title	USP
Description	R package for independence testing
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	None as yet.
URL	https://cran.r-project.org/web/packages/USP/index.html


Title	gfpop: Graph-Constrained Functional Pruning Optimal Partitioning
Description	Penalized parametric change-point detection by functional pruning dynamic programming algorithm. The successive means are constrained using a graph structure with edges of types null, up, down, std or abs. To each edge we can associate some additional properties: a minimal gap size, a penalty, some robust parameters (K,a). The user can also constrain the inferred means to lie between some minimal and maximal values. Data is modeled by a quadratic cost with possible use of a robust loss, biweight and Huber (see edge parameters K and a). Other losses are also available with log-linear representation or a log-log representation.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	This is open source software and we are unaware of any notable impacts.
URL	https://arxiv.org/abs/2002.03646


Title	ghcm: Functional Conditional Independence Testing with the GHCM
Description	A statistical hypothesis test for conditional independence. Given residuals from a sufficiently powerful regression, it tests whether the covariance of the residuals is vanishing. It can be applied to both discretely-observed functional data and multivariate data. Details of the method can be found in Anton Rask Lundborg, Rajen D. Shah and Jonas Peters (2020) .
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Too early to say.
URL	https://CRAN.R-project.org/package=ghcm


Title	ocd
Description	R package for online changepoint detection
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	None as yet.
URL	https://cran.r-project.org/web/packages/ocd/index.html


Title	ocd_CI
Description	R functions on github for online changepoint detection.
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	None as yet.
URL	https://github.com/yudongchen88/ocd_CI


Title	primePCA
Description	R package on CRAN for high-dimensional PCA with heterogeneous missingness
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	None as yet.
URL	https://cran.r-project.org/web/packages/primePCA/index.html

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications