Working Title: Novel Methods Toward Independent and Identically Distributed Test

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Description: In probability theory, a collection of random variables is independent and identically distributed (IID) if all random variables follow the same probability distribution and are mutually independent. The assumption viewing
observations generated by IID random variables is called IID assumption or randomness assumption [5, 7, 10-12], which is ubiquitous and often serves as the foundation in statistics [5], machine learning [14], entropy source estimations
[9] and so on. Nevertheless, the IID assumption is commonly violated in many practical environments [3, 4, 13] and it is therefore valuable to test the correctness of the assumption, called IID test or randomness test.
This IID test problem has been discussed for at least 100 years since 1919 [8] and is still being actively researched [1, 5, 6, 9-12]. Our aims for the project is to investigate IID test and develop novel methods of it. This project falls
within the EPSRC statistics and applied probability research area. More specifically, our motivations and objectives of proposing novel IID test methods are at least the following two-fold. First, we find that most existing
IID test methods are useful only under certain cases and can be useless in many others, i.e. cannot reject the assumptions of IID under many non-IID cases when the IID assumption is violated. Therefore, it would be valuable to
develop a test method being able to reject the IID assumption under more non-IID cases. Secondly, nearly all IID test methods are based on hypothesis test framework, which is often criticized as no conclusion can be drawn if the IID
assumption is not rejected [2]. One of our future work is thus to propose a more informative test method than hypothesis test. For example, this method could be a measure of the strength of the IID property of observations, potentially
between 0 and 1 with closing to 0 indicating severe violation of IID and closing to 1 high confidence of IID. This measure not only indicate whether to reject the IID assumption as hypothesis test but also shows the strength of the IID of
observations and therefore is more informative and could potentially be more useful in practice. For our aims of developing novel IID test methods, at least the following two potential methodologies could be carried out and deserve more research. It would be possible to develop new IID test methods using (1): the marginal like-1lihood ratio framework, in which marginal likelihoods reflecting the probability of different models (both models assumed observations IID and non-IID) generating observations are compared. Subsequently, the ratios of the probability of generating of observations of the models assuming IID and that of models assuming non-IID can indicate the strength of the IID of observations. (2):
permutation, where we can compare the observations and permuted observations. Roughly speaking, if we regard the permuted observations as generated by some (unknown) IID random variables, we can use certain distance measurements to measure the distances of observations and permuted observations. Intuitively, the smaller the distance is, the more similar the observations are to the permuted observations and therefore the more likely the observations are IID. Two potential methodologies for developing our methods have been briefly discussed and needed more research. Once completed, our methods can be valuable for many statistic and machine learning models which make the IID
assumption of observations and our methods can be used to test this assumption. Compared with most existing methods, which may not be able to reject many non-IID cases and could potentially require a large number of observations to
reject, our methods of IID test could be valuable in practice by rejecting IID assumptions under more non-IID cases using fewer observations and produce more informative test results.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

People

ORCID iD

Yijin Zeng (Student)

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2602749 Studentship EP/S023151/1 02/10/2021 30/08/2025 Yijin Zeng