Modelling and Predicting CKD Progression

Lead Research Organisation: University of Surrey

Department Name: Computing Science

Abstract

An overarching objective of this research is to revisit the problem of modelling the progression of Chronic Kidney Disease (CKD) using state-of-the-art machine learning techniques and methodologies. We introduce three innovations in this project.

First, we shall investigate statistical models that directly predict key clinical variables so that clinicians can make more informed decisions. This approach is more consistent with guidelines-based prescribing that is used by general practitioners.

Second, we will develop a way to identify patient groups by using data-driven methods. The approach used is similar to 'market segmentation' used in Business Intelligence. The hypothesis is that patients can be divided into groups not by their disease or stages (as currently practised) but by their patient records that essentially capture their health history. In essence, this method of grouping will naturally group patients with similar treatments including drugs and procedures and similar physiological and pathological characteristics.

Third, as part of the process in predicting the efficacy of kidney function, we will develop a risk model for predicting and detecting Acute Kidney Injury (AKI). This novel model will inform clinicians how likely it is that a patient will suffer from AKI. In short, we propose a unified framework to predict the efficacy of kidney function that also considers the possibility of AKI. This represents a potential advancement in modelling and understanding CKD because the risks of end-stage of CKD and AKI are so far often treated independently.

The potential advantages of the proposed method include: (1) better tailoring of the method to patient subgroups via data-driven stratification; (2) ability to exploit many more variables that are specific to each patient stratum; (3) ability to predict eGFR and ACR that can be used in conjunction with guidelines-based prescribing; (4) ability to predict Acute Kidney Injury.

The main outputs of this proposal are: (1) patient-tailored eGFR predictor, (2) patient-tailored ACR predictor, (3) probabilistic AKI estimator, and (4) data-driven patient stratification. By patient tailoring, we understand that the model is capable of considering additional variables that are specific to a patient stratum.

Technical Summary

OBJECTIVE: An overarching objective of this research is to revisit the problem of modelling the progression of Chronic Kidney Disease (CKD) using state-of-the-art machine learning techniques and methodologies. The proposed approach differs from the conventional ones in the following ways: It emphasizes prediction over explanation; (2) builds customized statistical models over generic but covariates-adjusted model; (3) exploits the entire patient history rather than using pre-selected variables; and, (4) explores data-driven stratification over disease-centric stratification.

INNOVATIONS: Specifically, we propose a mixture model approach to predicting key variables in CKD such as estimated Glomerular Filtration Rate (eGFR), Albumin:creatinine ratio (ACR), and Blood Pressure (BP). In addition, we also explore a novel data-driven patient stratification methodology. Last but not least, as part of the process in modelling eGFR, we also propose to develop a risk model for Acute Kidney Injury (AKI) based on all available historic data from a patient record.

ADVANTAGES: The potential advantages of the proposed method include: (1) better tailoring of the method to patient subgroups via data-driven stratification; (2) ability to exploit many more variables that are specific to each patient stratum; (3) ability to predict eGFR and ACR that can be used in conjunction with guidelines-based prescribing; (4) ability to predict Acute Kidney Injury.

OUTPUTS: The main outputs of this proposal are: (1) patient-tailored eGFR predictor, (2) patient-tailored ACR predictor, (3) probabilistic AKI estimator, and (4) data-driven patient stratification. By patient tailoring, we understand that the model is capable of considering additional variables that are specific to a patient stratum.

Planned Impact

1) BETTER REFERRAL TOOLS FOR GP CLINICIANS
The vast majority of CKD patients are monitored and managed by their GPs who have to make regular referral decisions. Through our commercial partner, TPP (the Phoenix Partnership), we will be able to help them provide general practitioners with better decision support tools so that they can in turn provide better care to patients with CKD. The software tools will provide better and/or more reliable ways to
- help GP clinicians to make referral decisions to specialists with confidence. Instead of interpreting the existing referral NICE guidelines (determining an 'accelerated decline in renal function' as a reduction of 5mL/min/1.73m2) of estimated Glomerular Filtration Rate [eGFR] within a year or 10 mL/min/1.73m2 within 5 years), they will use our model to infer the likelihood of decline in renal function with predicted trajectory and confidence bounds;
- identify patients with high risk of Acute Kidney Injury (AKI) -- this is an innovation that the project will develop;
- predict the eGFR trend and other variables (such as ACR, and blood pressure) by differentiating between acceptable and unacceptable variation in kidney function -- this is another innovative idea introduced by this project.

2) BETTER DATA ANALYTICS TOOLS FOR NEPHROLOGISTS
We develop better models that help nephrologists to better understand and monitor progression of CKD, especially when done remotely. This is particularly the case for the SEIK project where nephrologists would download the medical records of CKD patients and remotely monitor their conditions. Our software tool will automatically identify patients at risk of worsening conditions and at risk of AKI. This enables a new care model where high risk patients are brought to the attention of a GP clinician more proactively.

3) IMPROVED CARE EXPERIENCE
By improving care, patients at risk of progression to more severe stages of CKD will be monitored more closely. Better management implies improved quality of life and potentially cost savings in the long term.

4) BETTER/MORE COMPETITIVE SERVICE PROVIDED BY GP SERVICE PROVIDER
GP service providers could potentially use the developed models to predict eGFR, ACR, BP, and AKI at the point of care. This will improve the value they provide to their clients who are GP surgeries. Therefore, this ensures better service and better care.

Funded Value:

£519,780

Funded Period:

Sep 15 - Jul 17

Funder:

MRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

MR/M023281/1

Principal Investigator:

Norman Poh

Health Category:

Unclassified

Organisations

People	ORCID iD
Norman Poh (Principal Investigator)
Tom Chan (Co-Investigator)
Simon De Lusignan (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Poh N (2017) Probabilistic broken-stick model: A regression algorithm for irregularly sampled data with application to eGFR. in Journal of biomedical informatics

Tirunagari S (2016) Automatic classification of irregularly sampled time series with unequal lengths: A case study on estimated glomerular filtration rate

Tirunagari S (2016) Automatic detection of acute kidney injury episodes from primary care data

Tirunagari S (2016) Visualisation of survey responses using self-organising maps: A case study on diabetes self-care factors

Research Databases and Models
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Title	Surrey Acute Kidney Injury Detection Software (SAKIDS) version 1.0
Description	We are releasing the Surrey Acute Kidney Injury Detection Software (SAKIDS) which is based on the Surrey Acute Kidney Injury Detection Algorithm (SAKIDA). The software will detect AKI events given an eGFR time series or serum creatinine time series.
Type Of Material	Physiological assessment or outcome measure
Year Produced	2017
Provided To Others?	Yes
Impact	The software has just been released and the impact is not known yet. We hope that it can be used in clinical trials and for research purposes, e.g., finding AKI events from any eGFR time-series in any CKD studies. An algorithm based on SAKIDA was published in http://modellingckd.org/articles/Norman_AKI_Detection.pdf.
URL	https://sites.google.com/site/akidetection/


Title	Probabilistic broken-sticks model
Description	This regression model is motivated by the clinician's mindset that seek to describe a clinical outcome measure to be stable, progressive or regressive. The model works by fitting a series of linear lines or sticks that are time-bound in such a way that jointly they form a non-linear regression function but they are formed by locally linear regressions. We demonstrate the capabilities of this model in modelling estimated glomerular filtration rate.
Type Of Material	Computer model/algorithm
Provided To Others?	No
Impact	The algorithms associated with the model are still under development but we have already a publication describing it. The work is entitled "Probabilistic Broken-Stick Model: A Regression Algorithm for Irregularly Sampled Data with Application to eGFR", available under the ArXiv repository (https://arxiv.org/abs/1612.01409).
URL	https://arxiv.org/abs/1612.01409


Title	Surrey Acute Kidney Injury Detection Algorithm (SAKIDA)
Description	SAKIDA is a deterministic algorithm that localizes episodes of acute kidney injury from a time-series of estimated glomerular filtration rate (eGFR) which characterizes the kidney function. It can scan thousands of eGFR time series collected in primary care setting, thus enabling clinicians to estimate prevalence in large-scale databases or trial studies.
Type Of Material	Computer model/algorithm
Provided To Others?	No
Impact	At the moment, we plan to deploy the algorithm on the Royal College of General Practitioners Research and Surveillance Centre (RSGP-RSC) database. A paper describing this database is entitled "Automatic Detection of Acute Kidney Injury Episodes from Primary Care Data" and was published in IEEE Symposium on Computational Intelligence in Healthcare and e-health 2016.
URL	http://www.modellingckd.org


Description	eGFR Time-series Analysis Using East Kent Data Set
Organisation	East Kent Hospitals University NHS Foundation Trust
Country	United Kingdom
Sector	Public
PI Contribution	We are currently developing algorithms to model acute kidney injury and chronic kidney disease. Once proven to work, our software and model can be trialed with their current system.
Collaborator Contribution	East Kent has provided a data set consisting of 488 patients to allow us to both develop and test our algorithms. They run a renal clinic and also screen patients with chronic kidney disease remotely. Therefore, they are an excellent partner to field-test our software and model, in order to obtain feedback for further improvement.
Impact	A poster was presented in a conference organised by the British Renal Society (BRS) in 2015. The findings include: - new capability to screen patients remotely and reliably - classifying patients with chronic kidney disease by their trends
Start Year	2015


Title	Surrey Acute Kidney Injury Detection Software (SAKIDS) version 1.0
Description	The software will detect Acute Kidney Injury events given only eGFR or serum creatinine time-series. It is useful for retrospective analysis of eGFR time-series for which AKI events have not been identified.
Type Of Technology	Software
Year Produced	2017
Impact	No known impact at this stage.
URL	https://sites.google.com/site/akidetection/home


Description	"What could we learn from millions of patient records? A machine-learning perspective" in Big Data: Modeling, Estimation and Selection Workshop, 9-10 June, 2016, Ecole Centrale Lille, France.
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	I was an invited speaker in the application track of the workshop held on the second day. (The talks on the first day focused on the fundamental theories in machine learning and big data). This has stimulated increased interest in research on chronic kidney disease as a challenging machine learning problem. I have since followed up with a delegate to explore the possibility of exploiting outputs from my research project to target medical practitioners in Germany and France.
Year(s) Of Engagement Activity	2016
URL	https://blogs.surrey.ac.uk/computer-science/2016/06/14/dr-norman-poh-gave-a-talk-at-workshop-on-big-...


Description	A Factored Co-morbidity Approach for Modelling CKD Progression A Factored Co-morbidity Approach for Modelling CKD Progression
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	The factored co-morbidity approach is a Bayesian model proposed to model the interplay between different co-morbidities in a tractable manner. This general model can potentially be used to answer a number of clinical questions related to CKD. By presenting this poster, I hope to refine the model to address the diverse needs of clinicians in primary, secondary and community care. ABSTRACT Introduction: One of the key challenges in understanding CKD progression is its multifaceted aetiology. This is evident as it is commonly observed that hypertension, heart disease and diabetes are common co-morbidities of CKD. In other words, the existence of co-morbidities can potentially alter the risk of CKD progression, e.g. from stage 3 to 5. Unfortunately, "flat" risk models such as logistic regression, e.g. as implemented by the QKidney score and many similar risk models, are not designed to extract the rich structure induced by a multitude of co-morbidities, the state of which are often captured in routinely collected patient data. Methods: We revisit the modelling of CKD progression by proposing a Bayesian network that has a certain regular structure, which has attractive properties such as tractability and efficient computation. The risk of progression to more severe CKD stages can be inferred not only from eGFR but signs and symptoms (s) related to kidney impairment, while causality of CKD is captured by risk factors (r) and treatment strategies (t). Each co-morbidity is then allowed to directly interact with the CKD progression risk only if the diagnosis of the co-morbidity is available. Where this information is not available, the state of the co-morbidity can be inferred from its own s, r, and t variables. In addition, the model can also incorporate past episodes of acute kidney injury (AKI) when estimating the risk of CKD progression. This exploits the fact that people with AKI are more likely to have a lower eGFR value. The extent of this degradation is explicitly modelled by the Bayesian network. When the AKI information is not available, the risk of AKI can be inferred from its own r, s, and t variables, in a process known as 'marginalisation'. Therefore, the interplay between CKD and AKI can be captured by the model. The local models can be linked to form a single Bayesian network as shown in figure (a) below. In (b), the model is tailored for AKI, which makes provision for eGFR (g) to be influenced by AKI directly. Conclusion and relevance: Current risk models are 'flat' and do not exploit the rich information captured in patient records. The proposed factored co-morbidity approach offers a natural next step whilst being tractable and efficient in computation. The merit of this model will be investigated as part of the MRC Modelling CKD project using data extracted from the Royal College of GPs database.
Year(s) Of Engagement Activity	2017
URL	http://epubs.surrey.ac.uk/813744/


Description	Demonstration of two pieces of software as part of the Opening of the Innovation for Health Building, University of Surrey, 22 February, 2017, Guildford, Surrey, UK
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Supporters
Results and Impact	The PI and two team members explained the MRC CKD project and then demonstrated how the two pieces of software work; one for modelling Chronic Kidney Disease and another for detecting Acute kidney injury. As a result of this, a few visitors followed up with our research project, including a scientist in molecular and cell biology, working for LGC.
Year(s) Of Engagement Activity	2017
URL	http://www.surrey.ac.uk/innovation-for-health


Description	Presented a poster to the renal community at British Renal Society 2017 Conference about a Probabilistic Broken-stick Model for CKD Staging and Risk Stratification
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	A new regression model known as the probabilistic broken stick model was proposed to tackle the modelling of estimated Glomerular Filtration Rate (eGFR). By presenting this model, I hope that it can be used for modelling Chronic kidney disease which spans some 10-20 years. This can be used to understand the long-term progression of the disease. ABSTRACT Introduction: Determining CKD stage and disease progression based on eGFR in primary care is complicated by the fact that the measurements are irregularly sampled and influenced by both genuine physiological changes and external factors. Models used for these purposes would ideally capture both short- (for staging) and long-term (for progression) trends. However, existing regression algorithms such as linear, polynomial and Gaussian process regression either cannot account for these challenges or do not satisfy the key clinical requirements of providing an easily interpretable model that can elucidate short- and long-term trends. In order to balance interpretability and flexibility, an extension to broken-stick regression models is proposed in order to make them more suitable for modelling clinical time series. Method: The proposed broken-stick model proceeds by dividing a patient's eGFR time series into a number of overlapping windows of equal length (although windows can be of different lengths), and then performing a linear regression in each window. These locally linear line segments are then smoothly joined using a Bayesian approach, whereby the further away from a point in time t a windows is the less influence its line segment has near t. This is achieved by defining the posterior probability of the w-th window at time t as P(w\|t)?p(t\|w), where the window function p(t\|w) is bell-shaped, e.g. Gaussian. In order to demonstrate the utility of this proposed broken-stick model, we used it to model the long term trend of eGFR measurements from the primary care data of 12,000 patients collected as part of the QICKD study. Rather than rely on the raw eGFR values to determine the stage of a patient's CKD, we used the estimated mean eGFR value obtained directly from the broken-stick model. In addition, by calculating both the expected eGFR value (µ) and slope (µ') at a given time it is possible to stage and stratify patients according to the trajectory that their condition is taking. Results: In addition to using the broken-stick model to determine CKD stages, it is possible to both stage and stratify patients according to the trajectory that their condition is taking. From figure (a) we can see that using expected eGFR slope enables both the staging and trajectory of a patient's eGFR measurements to be taken into account, and allows us to stratify patients into categories dependent on their current eGFR and the expected trajectory of it. The broken-stick model also enables the calculation of the expected CKD stage posterior (b). Gaps found between the KDIGO guideline stage (dashed vertical lines) and the boundary between expected CKD stages can be interpreted as indicative of systemic variation in recording of patient data compared to what would be expected. Conclusion: The proposed broken-stick model can robustly estimate both short-term and long-term trends simultaneously, while also accommodating the unequal length and irregularly sampled nature of eGFR time series. While CKD staging is currently based on local trends (the most recent measurements), by modelling a patient's eGFR time series using a broken-stick model it is possible to base a patient's stage on their entire time series. Conversely, the broken-stick model enables CKD progression estimates to be based on both short- and long-term trends. CKD stages determined using the broken-stick model are largely consistent with those determined using the KDIGO guidelines, and therefore estimates of progression are likely to prove reliable as they are based on the same model. Taken together, these results could provide useful information when determining the trajectory of a patient's condition (which allows for early intervention) and in the retrospective identification of patients for clinical research.
Year(s) Of Engagement Activity	2017
URL	http://epubs.surrey.ac.uk/813745/


Description	Presented a poster to the renal community at British Renal Society 2017 Conference about a Study Protocol on Identifying Progressive CKD from Primary Care Records
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	We presented a poster to the renal community a study protocol designed to evaluate, measure and characterise objectively progressive Chronic Kidney Disease (CKD); and compare it with non-progressive CKD. A preliminary result was presented as a proof of concept of our approach. We hope to generate debate and understand the requirements from professional practitioners in order to better design the study protocol. ABSTRACT Background: One of the key challenges in managing CKD patients is to identify those who are progressive (worsening eGFR) from those who are non-progressive or may even have underlying improvement in their CKD. To this end, we have developed an algorithm capable of identify progressive from non-progressive CKD based on observed historical eGFR trends. Methods - Clinical codes. We identified 5-Byte Read Codes from the Royal College of General Practioners (RCGP) Research and Surveillance Centre (RSC) for a range of known factors, including codes related to signs and symptoms (s), laboratory measurements (m) and treatments (t) for CKD as well as its associated co-morbidities or risk factors (r) such as diabetes, cardiovascular disease and kidney-related diseases. For each risk factor, s, t, m and r are systematically identified, leading to a range of plausible phenotype variables for explaining a rapid decline in eGFR (See our Bayesian Justification accompanying poster). Algorithm. We have developed our own regression algorithm - called the broken-stick model - capable of estimating the rate change of eGFR by using a 'Bayesian' sliding window of three years in order to provide a stable estimate of the annual rate change of eGFR, while still being sensitive to underlying genuine patterns. The global eGFR slope (annual rate change) of a patient is defined as the average eGFR slope over the patient's entire history. Patient inclusion/exclusion criteria: All patients with eGFR measurements were included. Patients with an acute kidney injury episode or hereditary kidney diseases were excluded. Additionally, patients without consistent eGFR trends, defined as having a standard deviation of the eGFR slope of 2 units per year, were excluded. Initial descriptive observations: The longitudinal observational data was divided into equal groups of approximately 600 patients. In 9 out of 16 groups we found that there was a deterioration in eGFR, one group was equivocal and six groups showed improvement. Conclusions: A systematic cohort-based retrospective observational study, based on routinely collected primary care data coupled with advanced machine learning algorithms, could improve our understanding of the nature of the rapid progression of CKD in some groups (group 1) of patients in contrast to those (group 16) that show an improvement in eGFR.
Year(s) Of Engagement Activity	2017
URL	http://epubs.surrey.ac.uk/813746/


Description	Presented a poster to the renal community at British Renal Society 2017 Conference about the Surrey Acute Kidney Injury Detection Algorithm (SAKIDA)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	We have developed an algorithm which we would like to let the renal community know and encourage them to adopt and use it. ABSTRACT Introduction: Acute kidney injury (AKI) is characterised by a rapid deterioration in kidney function, and can be identified by examining the rate of change in a patient's estimated glomerular filtration rate (eGFR). Due to the potentially irreversible nature of the damage AKI episodes cause to renal function, their detection can play a significant role in predicting a kidney's effectiveness. Although algorithms for the detection of AKI are available for patients under constant monitoring, e.g. inpatients, their applicability to primary care settings is less clear as patients' eGFR often contains large lapses in time between measurements. We therefore present two alternative automated approaches for detecting AKI: using the novel Surrey AKI detection algorithm (SAKIDA) (Figure a) and as the outlier points when using Gaussian process regression (GPR) (Figure b). Method: The dataset used in this work contains the eGFR data of 488 patients (275 (56.4%) male and 213 (43.6%) female) treated at East Kent University Hospital, and was collected as part of a study seeking to understand the characteristics of acute kidney injury and its impact on chronic kidney disease. Each patient's eGFR data was manually labelled by a nephrologist with the number of AKI episodes experienced: 0, 1, 2, 3 or 4. In order to obtain the right confidence intervals to detect the AKI episodes as outliers, we trained the GPR using those patients that have no AKI present. The GPR and SAKIDA algorithms were compared to the method developed by NHS England to provide real-time detection and diagnosis of AKI in patients across the National Health Service in England. Results: SAKIDA was able to identify patients with no AKI episodes with an accuracy of 90.44%, while GPR achieved 83.82% and NHS England 73.53%. SAKIDA also detected no more than 4 AKI episodes per signal, in agreement with the expert's classifications, while GPR and NHS England detected more in 67 and 99 patients respectively. Interestingly, despite performing worst when detecting 0, 1, 2 and 4 AKI episodes, NHS England method detected eGFR signals with 3 AKI episodes with a greater accuracy than that of the other methods tested (17.64%) (see Table 1). Conclusion: Unlike the constant monitoring of an inpatient, patients in primary care will likely have less frequent and more irregular eGFR measurements along with a greater variability in eGFR values due to weaker controls over pre-test conditions. The main conclusion that can be drawn from the results is that SAKIDA performs better than both GPR and the NHS England algorithms, not only due to the greater accuracy with which it identifies patients with no AKI episodes, but also because it better matches the expert's classifications overall. This indicates that GPR and the NHS England algorithms are likely to be less suitable for retrospective identification of AKI episodes in primary care data, and are instead more suitable as real-time alert systems. Given that SAKIDA closely matches the expert's predictions and can be used for both retrospective analysis and real-time alerts, we believe it should be the method of choice when identifying AKI episodes within primary care data.
Year(s) Of Engagement Activity	2017
URL	http://epubs.surrey.ac.uk/813747/