# Chain Event Graphs and Applications to Longitudinal Studies

Lead Research Organisation: University of Warwick
Department Name: Statistics

### Abstract

Chain Event Graphs (CEGs) are a form of statistical model and deductive reasoning tool based on staged trees, a coloured probability tree, and are a rapidly growing research field with a wide range of applications. This project aims to further develop on the existing theory and software associated with CEGs and their dynamic counterparts. These applications can also be explored, with CEGs already being used in fields such as public health, forensic science, tourism and criminal radicalisation.

I have already analysed a dataset on the treatment of early epilepsy and single seizures. This work focused on analysing the probability of a tonic-clonic seizure occurring within 1 year, dependent on the individual's baselines covariates and whether they received treatment with anti-epileptic drugs or not. This investigation opened up further areas of research on new approaches to analysis, and potential improvements to existing methods.

One of these new approaches is to incorporate continuous data or potentially infinite discrete data and response variables. For example, time between seizures is continuous and has been considered in Dynamic CEGs (DCEGs). Number of seizures in a period will be modelled by assuming the number of seizures suffered in a year follows a Poisson process. Currently, there is very little work being done on incorporating such data, except for including holding times in DCEGs, which is only one possibility. As such, I will extend CEGs to a new form called a Poisson CEG (PCEG), where the response variable will be assumed to come from a Poisson process, and I will delve into the theory, methods, and applications of this new model.

In many of the applications where the response variable could be assumed to come from a Poisson process, there are a greater number of observations of zero, zero counts, than would be expected. This is an example of zero-inflation, and can be modelled with a zero-inflated Poisson (ZIP) distribution. This is centred on the idea that not all zeroes are created equal; some individuals will never have a nonzero count, and thus considered "risk free", while others may have a zero count but still be "at risk" and would have a nonzero count if observed for long enough. The ZIP aims to estimate the proportion of at risk individuals and subsequently estimate their underlying rate, which would be underestimated if zero-inflation was not accounted for. I will discuss methods used to incorporate this zero-inflation into the CEG through the introduction of a latent risk state variable, which extends the PCEG to a Zero-inflated Poisson CEG (ZIPCEG).

As the number of covariates increases and thus the size of the tree grows, this leads to sparse edge counts in the later parts of the tree, particularly when the overall sample size is insufficient. These parse edge counts can lead to spurious and unreliable conclusions. I will propose various methods to address and alleviate sparse edge counts, culminating in the novel intermediate CEG, where conditional independence relations are asserted in order to decrease the size of the tree. These methods will be demonstrated using real world data.

In order to further the development of CEGs, the existing software and packages in R must be made user friendly, particularly for the purposes of fitting and graphing CEGs. One of the main R packages focused on CEGs, ceg, has several bugs and is not actively maintained. I plan to publish my own package, pcegr, which uses the graphing methods of ceg but with the functionality to fit CEGs, PCEGs and ZIPCEGs, as well as perform variable discretisation.

10 25 50

### Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/V520226/1 30/09/2020 31/10/2025
2440874 Studentship EP/V520226/1 04/10/2020 04/10/2024 Conor Hughes