Chain Event Graphs and Applications to Longitudinal Studies

Lead Research Organisation: University of Warwick
Department Name: Statistics

Abstract

Chain Event Graphs (CEGs) are a form of statistical model and deductive reasoning tool based on staged trees, a coloured probability tree, and are a rapidly growing research field with a wide range of applications. This project aims to further develop on the existing theory and software associated with CEGs and their dynamic counterparts. These applications can also be explored, with CEGs already being used in fields such as public health, forensic science, tourism and criminal radicalisation.



I have already analysed a dataset on the treatment of early epilepsy and single seizures. This work focused on analysing the probability of a tonic-clonic seizure occurring within 1 year, dependent on the individual's baselines covariates and whether they received treatment with anti-epileptic drugs or not. This investigation opened up further areas of research on new approaches to analysis, and potential improvements to existing methods.



One of these new approaches is to incorporate continuous data or potentially infinite discrete data and response variables. For example, time between seizures is continuous and has been considered in Dynamic CEGs (DCEGs). Number of seizures in a period will be modelled by assuming the number of seizures suffered in a year follows a Poisson process. Currently, there is very little work being done on incorporating such data, except for including holding times in DCEGs, which is only one possibility. As such, I will extend CEGs to a new form called a Poisson CEG (PCEG), where the response variable will be assumed to come from a Poisson process, and I will delve into the theory, methods, and applications of this new model.



In many of the applications where the response variable could be assumed to come from a Poisson process, there are a greater number of observations of zero, zero counts, than would be expected. This is an example of zero-inflation, and can be modelled with a zero-inflated Poisson (ZIP) distribution. This is centred on the idea that not all zeroes are created equal; some individuals will never have a nonzero count, and thus considered "risk free", while others may have a zero count but still be "at risk" and would have a nonzero count if observed for long enough. The ZIP aims to estimate the proportion of at risk individuals and subsequently estimate their underlying rate, which would be underestimated if zero-inflation was not accounted for. I will discuss methods used to incorporate this zero-inflation into the CEG through the introduction of a latent risk state variable, which extends the PCEG to a Zero-inflated Poisson CEG (ZIPCEG).



There has been recent work on variable discretisation for CEGs, but the proposed methods do not cope well with smaller data sets. In particular, they struggle when the staged tree contains sparse edge counts, and so I will propose several methods to alleviate these issues. There is also significant scope for development of more robust forms of variable selection, as there is very little published theory for CEGS on variable selection methods. Variables are often selected based on existing studies or the advice of domain experts. Existing variable selection methods for other forms of statistical modelling, such as Generalised Linear Models (GLMs), are a natural starting point.



In order to further the development of CEGs, the existing software and packages in R must be made user friendly, particularly for the purposes of fitting and graphing CEGs. One of the main R packages focused on CEGs, ceg, has several bugs and is not actively maintained. I plan to publish my own package, pcegr, which uses the graphing methods of ceg but with the functionality to fit CEGs, PCEGs and ZIPCEGs, as well as perform variable discretisation.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/V520226/1 01/10/2020 31/10/2025
2440874 Studentship EP/V520226/1 05/10/2020 05/10/2024 Conor Hughes