StatScale: Statistical Scalability for Streaming Data

Lead Research Organisation: Lancaster University
Department Name: Mathematics and Statistics

Abstract

We live in the age of data. Technology is transforming our ability to collect and store data on unprecedented scales. From the use of Oyster card data to improve London's transport network, to the Square Kilometre Array astrophysics project that has the potential to transform our understanding of the universe, Big Data can inform and enrich many aspects of our lives. Due to the widespread use of sensor-based systems in everyday life, with even smartphones having sensors that can monitor location and activity level, much of the explosion of data is in the form of data streams: data from one or more related sources that arrive over time. It has even been estimates that there will be over 30 billion devices collecting data streams by 2020.

The important role of Statistics within "Big Data" and data streams has been clear for some time. However the current tendency has been to focus purely on algorithmic scalability, such as how to develop versions of existing statistical algorithms that scale better with the amount of data. Such an approach, however, ignores the fact that fundamentally new issues often arise when dealing with data sets of this magnitude, and highly innovative solutions are required.

Model error is one such issue. Many statistical approaches are based on the use of mathematical models for data. These models are only approximations of the real data-generating mechanisms. In traditional applications, this model error is usually small compared with the inherent sampling variability of the data, and can be overlooked. However, there is an increasing realisation that model error can dominate in Big Data applications. Understanding the impact of model error, and developing robust methods that have excellent statistical properties even in the presence of model error, are major challenges.

A second issue is that many current statistical approaches are not computationally feasible for Big Data. In practice we will often need to use less efficient statistical methods that are computationally faster, or require less computer memory. This introduces a statistical-computational trade-off that is unique to Big Data, leading to many open theoretical questions, and important practical problems.

The strategic vision for this programme grant is to investigate and develop an integrated approach to tackling these and other fundamental statistical challenges. In order to do this we will focus in particular on analysing data streams. An important issue with this type of data is detecting changes in the structure of the data over time. This will be an early area of focus for the programme, as it has been identified as one of seven key problem areas for Big Data. Moreover it is an area in which our research will lead to practically important breakthroughs. Our philosophy is to tackle methodological, theoretical and computational aspects of these statistical problems together, an approach that is only possible through the programme grant scheme. Such a broad perspective is essential to achieve the substantive fundamental advances in statistics envisaged, and to ensure our new methods are sufficiently robust and efficient to be widely adopted by academics, industry and society more generally.

Planned Impact

Who will benefit?
This proposal will benefit a variety of different stakeholders including:
(a) A wide range of industries, including collaborating industrial partners and those organisations that handle large volumes of data [e.g. the NHS, Transport Agencies, Energy companies etc.];
(b) Society more generally through the application of this research;
(c) The academic research community, particularly in disciplines that underpin and relate to the data sciences;
(d) Project personnel: PDRAs and PhD students.

How will they benefit?
New techniques: (a, b, c)
The research undertaken will develop a number of exciting new statistical techniques that will be disseminated to our partners and user communities. Our methods will result in more efficient and cost-effective ways of marshalling precious resources, by making principled analysis of very large datasets either i) feasible and/or ii) faster and more accurate. These benefits will flow through the economy and society via a number of different mechanisms. These might include more efficient use of resources (e.g. better management of oil fields via improved processing of well operation data); improved productivity (e.g. via more timely management and intervention of faults on telecommunications networks) and society more generally (via the development of statistical methods capable of analysing eHealth data streams, e.g. support monitoring of vulnerable elderly living independently). To enable this, we specifically include resource for a Research Software Engineer to make available high-quality, documented open source code for others to use.

Targeted Knowledge Exchange: (a)
Significant further benefit will accrue to beneficiary group (a) through their partnership on this project. At this stage, we benefit from the support of several leading organisations in the energy, health and telecommunications sectors. They have expressed enthusiastic support for this programme's vision and have provided valuable insight and advice as we have developed this proposal.
For example, through dialogue with this community the idea of short-term secondment visits via a partnership programme has developed. PDRAs will spend periods of time at partner locations developing case studies that demonstrate the utility of developed methods on data rich products and systems. They are also keen to work with us to develop successful knowledge exchange mechanisms. Representatives from this community will also sit on the Advisory and Impact Board.

Generic Knowledge Exchange: (c)
We will develop methods that are of considerable interest to the academic community both in Statistics and other fields. As well as the traditional routes of journal publication, workshops and conferences the programme will develop open source R software that embodies our techniques: these will benefit the academic community and beyond. Further, we will work with our advisory group to share our techniques to a wider audience where appropriate, through an academic partnership programme that will facilitate research retreats, academic exchanges etc.

Developing good people: (b,d)
The programme will develop highly skilled researchers in a statistical field of high strategic importance. Project personnel will benefit from a supportive training, research and development environment, given the opportunity to create new techniques and see them employed in a productive and worthwhile setting. They will therefore be ideally positioned to seek future employment in a field/industry that enables them to make a strong contribution to society.

Contributing to the future supply of people: (all)
This proposal will secure an increase in the number and quality of researchers in statistics in an area of historic shortage. In particular, with the advent of sensor-based industrial systems, the need for developing future research leaders capable of underpinning the UK's competitive advantage in this area is crucial

Publications

10 25 50
publication icon
Agarwal G (2023) Semiparametric detection of changepoints in location, scale, and copula in Statistical Analysis and Data Mining: The ASA Data Science Journal

publication icon
Aston J (2018) High dimensional efficiency with applications to change point tests in Electronic Journal of Statistics

publication icon
Bardwell L (2018) Most Recent Changepoint Detection in Panel Data in Technometrics

publication icon
Berrett TB (2021) USP: an independence test that improves on Pearson's chi-squared and the G-test. in Proceedings. Mathematical, physical, and engineering sciences

publication icon
Berrett, T. B. (2021) Optimal rates for independence testing via U-statistic permutation tests in Annals of Statistics

 
Description The StatScale Programme was conceived to help catalyse research in the broad area of scalable statistical methods for streaming data. To achieve this, StatScale's team brought together diverse research strengths across the range of statistical research activity to develop the next generation tools required to realise this ambitious vision.

As a consequence, StatScale has made major contributions in three main areas:
· Changepoint methods
· Conditional Independence Testing
· Model Misspecification
At StatScale's outset each of these areas represented a new challenge where the development of novel methods and tools was meaningful and useful. Major progress has been made in all three - providing a suite of important publications, describing new statistical tools and their implementation in software form for researchers and practitioners alike. Moreover, both the research and associated community building events supported by the programme have helped to stimulate activity within in each of these areas internationally. This is particularly clear in the area of changepoint methods, a statistical topic that is of growing importance to a number of other research and application areas.

Finally, the StatScale programme also developed a number of postdocs who have gone onto academic positions at a range of leading universities.
Exploitation Route The research from this programme may be taken forward in a number of ways. For example, within the Statistical research community, we hope that StatScale's legacy will be to have catalysed and sustained activity in each of the three main areas for several years to come. Our understanding is that the methods, and associated software, developed are also being explored in a number of other disciplines: from computer science and digital networking, to astrophysics. As the tools and methods become increasingly shared, we envisage the breadth of areas benefiting from this research growing.
Sectors Aerospace

Defence and Marine

Agriculture

Food and Drink

Chemicals

Construction

Digital/Communication/Information Technologies (including Software)

Electronics

Energy

Environment

Financial Services

and Management Consultancy

Healthcare

Government

Democracy and Justice

Manufacturing

including Industrial Biotechology

Culture

Heritage

Museums and Collections

Pharmaceuticals and Medical Biotechnology

Retail

Security and Diplomacy

 
Description To date, the methods developed have predominantly attracted interest from our industrial partners. At the time of writing, not all results have been published so the full extent of the impact of this grant will not be known for some time. However we note two noteworthy examples of impact arising from the programme thus far: The first relates to the anomaly detection work reported by Fisch et. al (2022), which is already being used by BT to provide data-driven insights that help operate and maintain the UK's digital infrastructure. Other methods, such as those reported by Jewell et al. (2020), have been used by the Allen Institute for Brain Science as they develop understanding of how the human brain works.
First Year Of Impact 2019
Sector Aerospace, Defence and Marine,Agriculture, Food and Drink,Construction,Digital/Communication/Information Technologies (including Software),Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Transport
Impact Types Societal

Economic

 
Description Isaac Newton Programme on Statistical Scalability
Amount £180,000 (GBP)
Funding ID Statistical Scalability 
Organisation Isaac Newton Institute for Mathematical Sciences 
Sector Academic/University
Country United Kingdom
Start 01/2018 
End 06/2018
 
Description Methodologically Enhanced Virtual Labs for Early Warning of Significant or Catastrophic Change in Ecosystems: Changepoints for a Changing Planet
Amount £203,419 (GBP)
Funding ID NE/T006102/1 
Organisation Natural Environment Research Council 
Sector Public
Country United Kingdom
Start 11/2019 
End 11/2021
 
Description Next Generation Converged Digital infrastructure (NG-CDI)
Amount £5,000,000 (GBP)
Funding ID EP/R004935/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 11/2017 
End 10/2022
 
Title Detecting Changes in Slope With an L0 Penalty 
Description While there are many approaches to detecting changes in mean for a univariate time series, the problem of detecting multiple changes in slope has comparatively been ignored. Part of the reason for this is that detecting changes in slope is much more challenging: simple binary segmentation procedures do not work for this problem, while existing dynamic programming methods that work for the change in mean problem cannot be used for detecting changes in slope. We present a novel dynamic programming approach, CPOP, for finding the "best" continuous piecewise linear fit to data under a criterion that measures fit to data using the residual sum of squares, but penalizes complexity based on an L0 penalty on changes in slope. We prove that detecting changes in this manner can lead to consistent estimation of the number of changepoints, and show empirically that using an L0 penalty is more reliable at estimating changepoint locations than using an L1 penalty. Empirically CPOP has good computational properties, and can analyze a time series with 10,000 observations and 100 changes in a few minutes. Our method is used to analyze data on the motion of bacteria, and provides better and more parsimonious fits than two competing approaches. Supplementary material for this article is available online. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
URL https://tandf.figshare.com/articles/Detecting_changes_in_slope_with_an_i_L_i_sub_0_sub_penalty/69870...
 
Title Detecting Changes in Slope With an L0 Penalty 
Description While there are many approaches to detecting changes in mean for a univariate time series, the problem of detecting multiple changes in slope has comparatively been ignored. Part of the reason for this is that detecting changes in slope is much more challenging: simple binary segmentation procedures do not work for this problem, while existing dynamic programming methods that work for the change in mean problem cannot be used for detecting changes in slope. We present a novel dynamic programming approach, CPOP, for finding the "best" continuous piecewise linear fit to data under a criterion that measures fit to data using the residual sum of squares, but penalizes complexity based on an L0 penalty on changes in slope. We prove that detecting changes in this manner can lead to consistent estimation of the number of changepoints, and show empirically that using an L0 penalty is more reliable at estimating changepoint locations than using an L1 penalty. Empirically CPOP has good computational properties, and can analyze a time series with 10,000 observations and 100 changes in a few minutes. Our method is used to analyze data on the motion of bacteria, and provides better and more parsimonious fits than two competing approaches. Supplementary material for this article is available online. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
URL https://tandf.figshare.com/articles/Detecting_changes_in_slope_with_an_i_L_i_sub_0_sub_penalty/69870...
 
Title Detecting changes in slope with an L0 penalty 
Description Whilst there are many approaches to detecting changes in mean for a univariate time-series, the problem of detecting multiple changes in slope has comparatively been ignored. Part of the reason for this is that detecting changes in slope is much more challenging: simple binary segmentation procedures do not work for this problem, whilst existing dynamic programming methods that work for the change in mean problem cannot be used for detecting changes in slope. We present a novel dynamic programming approach, CPOP, for finding the "best" continuous piecewise-linear fit to data under a criterion that measures fit to data using the residual sum of squares, but penalises complexity based on an L0 penalty on changes in slope. We prove that detecting changes in this manner can lead to consistent estimation of the number of changepoints, and show empirically that using an L0 penalty is more reliable at estimating changepoint locations than using an L1 penalty. Empirically CPOP has good computational properties, and can analyse a time-series with 10,000 observations and 100 changes in a few minutes. Our method is used to analyse data on the motion of bacteria, and provides better and more parsimonious fits than two competing approaches. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
URL https://tandf.figshare.com/articles/Detecting_changes_in_slope_with_an_i_L_i_sub_0_sub_penalty/69870...
 
Title Inference in High-Dimensional Online Changepoint Detection 
Description We introduce and study two new inferential challenges associated with the sequential detection of change in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices of coordinates in which the mean changes. We propose an online algorithm that produces an interval with guaranteed nominal coverage, and whose length is, with high probability, of the same order as the average detection delay, up to a logarithmic factor. The corresponding support estimate enjoys control of both false negatives and false positives. Simulations confirm the effectiveness of our methodology, and we also illustrate its applicability on the U.S. excess deaths data from 2017 to 2020. The supplementary material, which contains the proofs of our theoretical results, is available online. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://tandf.figshare.com/articles/dataset/Inference_in_High-dimensional_Online_Changepoint_Detecti...
 
Title Inference in High-Dimensional Online Changepoint Detection 
Description We introduce and study two new inferential challenges associated with the sequential detection of change in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices of coordinates in which the mean changes. We propose an online algorithm that produces an interval with guaranteed nominal coverage, and whose length is, with high probability, of the same order as the average detection delay, up to a logarithmic factor. The corresponding support estimate enjoys control of both false negatives and false positives. Simulations confirm the effectiveness of our methodology, and we also illustrate its applicability on the U.S. excess deaths data from 2017 to 2020. The supplementary material, which contains the proofs of our theoretical results, is available online. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://tandf.figshare.com/articles/dataset/Inference_in_High-dimensional_Online_Changepoint_Detecti...
 
Title Inference in High-dimensional Online Changepoint Detection 
Description We introduce and study two new inferential challenges associated with the sequential detection of change in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices of coordinates in which the mean changes. We propose an online algorithm that produces an interval with guaranteed nominal coverage, and whose length is, with high probability, of the same order as the average detection delay, up to a logarithmic factor. The corresponding support estimate enjoys control of both false negatives and false positives. Simulations confirm the effectiveness of our methodology, and we also illustrate its applicability on the US excess deaths data from 2017-2020. The supplementary material, which contains the proofs of our theoretical results, is available online. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://tandf.figshare.com/articles/dataset/Inference_in_High-dimensional_Online_Changepoint_Detecti...
 
Title Subset Multivariate Collective and Point Anomaly Detection 
Description In the recent years, there has been a growing interest in identifying anomalous structure within multivariate data sequences. We consider the problem of detecting collective anomalies, corresponding to intervals where one, or more, of the data sequences behaves anomalously. We first develop a test for a single collective anomaly that has power to simultaneously detect anomalies that are either rare, that is affecting few data sequences, or common. We then show how to detect multiple anomalies in a way that is computationally efficient but avoids the approximations inherent in binary segmentation-like approaches. This approach is shown to consistently estimate the number and location of the collective anomalies-a property that has not previously been shown for competing methods. Our approach can be made robust to point anomalies and can allow for the anomalies to be imperfectly aligned. We show the practical usefulness of allowing for imperfect alignments through a resulting increase in power to detect regions of copy number variation. Supplemental files for this article are available online. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://tandf.figshare.com/articles/dataset/Subset_Multivariate_Collective_and_Point_Anomaly_Detecti...
 
Title Subset Multivariate Collective and Point Anomaly Detection 
Description In the recent years, there has been a growing interest in identifying anomalous structure within multivariate data sequences. We consider the problem of detecting collective anomalies, corresponding to intervals where one, or more, of the data sequences behaves anomalously. We first develop a test for a single collective anomaly that has power to simultaneously detect anomalies that are either rare, that is affecting few data sequences, or common. We then show how to detect multiple anomalies in a way that is computationally efficient but avoids the approximations inherent in binary segmentation-like approaches. This approach is shown to consistently estimate the number and location of the collective anomalies-a property that has not previously been shown for competing methods. Our approach can be made robust to point anomalies and can allow for the anomalies to be imperfectly aligned. We show the practical usefulness of allowing for imperfect alignments through a resulting increase in power to detect regions of copy number variation. Supplemental files for this article are available online. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://tandf.figshare.com/articles/dataset/Subset_Multivariate_Collective_and_Point_Anomaly_Detecti...
 
Title Anomaly 
Description An implementation of CAPA (Collective And Point Anomaly) for the detection of anomalies in time series data. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact An implementation of CAPA (Collective And Point Anomaly) for the detection of anomalies in time series data. 
URL https://cran.r-project.org/web/packages/anomaly/index.html
 
Title BayesProject: Fast Projection Direction for Multivariate Changepoint Detection 
Description Implementations in 'cpp' of the BayesProject algorithm (see G. Hahn, P. Fearnhead, I.A. Eckley (2020) ) which implements a fast approach to compute a projection direction for multivariate changepoint detection, as well as the sum-cusum and max-cusum methods, and a wild binary segmentation wrapper for all algorithms. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This is open source software, and we are unaware of any notable impacts. 
URL https://doi.org/10.1007%2Fs11222-020-09966-2
 
Title CatReg: Solution Paths for Linear and Logistic Regression Models with SCOPE Penalty 
Description Computes solutions for regularised linear and logistic regression models with high-dimensional categorical covariates. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Too early to say 
URL https://CRAN.R-project.org/package=CatReg
 
Title ChangepointInference 
Description Software to implement post-selection inference method for change points from Jewell, S., Fearnhead, P., & Witten, D. (Accepted/In press). Testing for a Change in Mean After Changepoint Detection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact None 
URL https://arxiv.org/abs/1910.04291
 
Title DeCAFS: Detecting Changes in Autocorrelated and Fluctuating Signals 
Description Detect abrupt changes in time series with local fluctuations as a random walk process and autocorrelated noise as an AR(1) process. See Romano, G., Rigaill, G., Runge, V., Fearnhead, P. (2020) 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This is open-source software, we are currently unaware of any notable impacts. 
URL https://arxiv.org/abs/2005.01379
 
Title Functional Online CUSUM 
Description Implement the Functional Online CUSUM method of Fast Online Changepoint Detection via Functional Pruning CUSUM statistics Gaetano Romano, Idris Eckley, Paul Fearnhead, Guillem Rigaill arXiv.2110.08205 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact None. Though interest in the method has been shown by British Telecom 
URL https://arxiv.org/abs/2110.08205
 
Title GRPtests 
Description Methodology for testing nonlinearity in the conditional mean function in low- or high-dimensional generalized linear models, and the significance of (potentially large) groups of predictors. Details on the algorithms can be found in the paper by Jankova, Shah, Buehlmann and Samworth (2019) . 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Too early to say 
URL https://CRAN.R-project.org/package=GRPtests
 
Title GeneralisedCovarianceMeasure: Test for Conditional Independence Based on the Generalized Covariance Measure (GCM) 
Description A statistical hypothesis test for conditional independence. It performs nonlinear regressions on the conditioning variable and then tests for a vanishing covariance between the resulting residuals. It can be applied to both univariate random variables and multivariate random vectors. Details of the method can be found in Rajen D. Shah and Jonas Peters (2018) . 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Used by A.P. Moller Maersk in testing whether structural causal models relating to pricing can be falsified. 
URL https://CRAN.R-project.org/package=GeneralisedCovarianceMeasure
 
Title IndepTest 
Description R package for independence testing 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact None as yet. 
URL https://cran.r-project.org/web/packages/IndepTest/index.html
 
Title InspectChangepoint 
Description R package for high-dimensional changepoint estimation. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact None as yet. 
URL https://cran.r-project.org/web/packages/InspectChangepoint/index.html
 
Title LogConcComp 
Description Github python code for computing the log-concave maximum likelihood estimator 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact None as yet. 
URL https://github.com/wenyuC94/LogConcComp
 
Title MCARtest: Optimal Nonparametric Testing of Missing Completely at Random 
Description R package 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact None as yet. 
URL https://cran.r-project.org/web/packages/MCARtest/index.html
 
Title MissInspect 
Description Github R functions for changepoint estimation with heterogeneous missingness 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact None as yet. 
URL https://github.com/wangtengyao/MissInspect
 
Title R package:CROPS 
Description Implementation of the CROPS wrapper for changepoint methods. The CROPS algorithm is described in Haynes, Kaylea, Idris A. Eckley, and Paul Fearnhead. "Computationally efficient changepoint detection for a range of penalties." Journal of Computational and Graphical Statistics 26.1 (2017): 134-143. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact None 
URL https://cran.r-project.org/web/packages/crops/index.html
 
Title RobKF: Innovative and/or Additive Outlier Robust Kalman Filtering 
Description Implements a series of robust Kalman filtering approaches. It implements the additive outlier robust filters of Ruckdeschel et al. (2014) and Agamennoni et al. (2018) , the innovative outlier robust filter of Ruckdeschel et al. (2014) , as well as the innovative and additive outlier robust filter of Fisch et al. (2020) 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This is open source software, and we are unaware of any notable impacts. 
URL https://arxiv.org/abs/2007.03238
 
Title SPCAvRP 
Description R package for sparse PCA 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact None as yet. 
URL https://cran.r-project.org/web/packages/SPCAvRP/index.html
 
Title Sshaped 
Description R package for fitting S-shaped functions 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact None as yet. 
URL https://cran.r-project.org/web/packages/Sshaped/index.html
 
Title USP 
Description R package for independence testing 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact None as yet. 
URL https://cran.r-project.org/web/packages/USP/index.html
 
Title gfpop: Graph-Constrained Functional Pruning Optimal Partitioning 
Description Penalized parametric change-point detection by functional pruning dynamic programming algorithm. The successive means are constrained using a graph structure with edges of types null, up, down, std or abs. To each edge we can associate some additional properties: a minimal gap size, a penalty, some robust parameters (K,a). The user can also constrain the inferred means to lie between some minimal and maximal values. Data is modeled by a quadratic cost with possible use of a robust loss, biweight and Huber (see edge parameters K and a). Other losses are also available with log-linear representation or a log-log representation. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This is open source software and we are unaware of any notable impacts. 
URL https://arxiv.org/abs/2002.03646
 
Title ghcm: Functional Conditional Independence Testing with the GHCM 
Description A statistical hypothesis test for conditional independence. Given residuals from a sufficiently powerful regression, it tests whether the covariance of the residuals is vanishing. It can be applied to both discretely-observed functional data and multivariate data. Details of the method can be found in Anton Rask Lundborg, Rajen D. Shah and Jonas Peters (2020) . 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Too early to say. 
URL https://CRAN.R-project.org/package=ghcm
 
Title ocd 
Description R package for online changepoint detection 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact None as yet. 
URL https://cran.r-project.org/web/packages/ocd/index.html
 
Title ocd_CI 
Description R functions on github for online changepoint detection. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact None as yet. 
URL https://github.com/yudongchen88/ocd_CI
 
Title primePCA 
Description R package on CRAN for high-dimensional PCA with heterogeneous missingness 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact None as yet. 
URL https://cran.r-project.org/web/packages/primePCA/index.html