EVALUATING THE PERFORMANCE OF RISK PREDICTION MODELS
Lead Research Organisation:
University of Oxford
Department Name: Oncology
Abstract
More people are living with disease and ill-health than ever before due to increased life expectancy. Doctors often need to estimate the prognosis of a patient with a specific disease or assess their likelihood of developing a disease. Prognosis research provides crucial evidence underpinning clinical decisions about treatment, further tests, or lifestyle changes.
Risk scores are a valuable source of prognostic information for doctors, combining patient and disease characteristics to estimate the risk of a specific outcome for that individual. These estimates of risk can then be used to guide patients and doctors in making clinical decisions. Risk scores feature increasingly in healthcare policy and clinical practice guidelines.
Developing risk scores is a complex process involving a number of methodological choices that dictate the value of the model?s predictions on patients? outcomes. After developing a risk score, it is essential to demonstrate that it works. Patients will not benefit from a risk score unless distinguishes between those with a good or poor prognosis. Performance is often assessed on the same group of patients used to derive the risk score leading to optimistic results. It is necessary to evaluate the risk score?s performance on a new group of patients, called a validation study. There is currently no consensus on the best way to carry out such a validation study.
We plan to investigate several methodological aspects of validation studies that influence the performance of a risk score. This will enable us to develop guidance for the conduct of future validation studies. We shall seek to ascertain how many individuals to study to allow reliable evaluation of a risk score, and provide guidance on the best way to deal with missing data. Another issue is the manner in which continuous measurements are handled. Often values of such measurements are just grouped as high or low values, an approach that implies that two particular patients with values just above and just below the chosen cut-point are implausibly characterised as having very different prognoses.
We will use a data set of several million general practice patient records to address all these issues, using risk scores published in the medical literature and that are available in the public domain on the internet.
Our results will provide researchers with guidance on designing and conducting high quality studies that evaluate the performance risk scores. This will improve the quality of prognostic information available to doctors and policy-makers.
Risk scores are a valuable source of prognostic information for doctors, combining patient and disease characteristics to estimate the risk of a specific outcome for that individual. These estimates of risk can then be used to guide patients and doctors in making clinical decisions. Risk scores feature increasingly in healthcare policy and clinical practice guidelines.
Developing risk scores is a complex process involving a number of methodological choices that dictate the value of the model?s predictions on patients? outcomes. After developing a risk score, it is essential to demonstrate that it works. Patients will not benefit from a risk score unless distinguishes between those with a good or poor prognosis. Performance is often assessed on the same group of patients used to derive the risk score leading to optimistic results. It is necessary to evaluate the risk score?s performance on a new group of patients, called a validation study. There is currently no consensus on the best way to carry out such a validation study.
We plan to investigate several methodological aspects of validation studies that influence the performance of a risk score. This will enable us to develop guidance for the conduct of future validation studies. We shall seek to ascertain how many individuals to study to allow reliable evaluation of a risk score, and provide guidance on the best way to deal with missing data. Another issue is the manner in which continuous measurements are handled. Often values of such measurements are just grouped as high or low values, an approach that implies that two particular patients with values just above and just below the chosen cut-point are implausibly characterised as having very different prognoses.
We will use a data set of several million general practice patient records to address all these issues, using risk scores published in the medical literature and that are available in the public domain on the internet.
Our results will provide researchers with guidance on designing and conducting high quality studies that evaluate the performance risk scores. This will improve the quality of prognostic information available to doctors and policy-makers.
Technical Summary
Making predictions on the prognosis of disease or the likelihood of developing a disease is an important role of doctors and healthcare policy makers. Risk prediction models are tools that examine the likelihood of patient outcomes in relation to multiple patients and disease characteristics. They are used to inform individuals about the future course of their illness (or their risk of developing an illness) and to assist doctors and patients in making joint decisions on future management, treatment plans and lifestyle changes. After a risk prediction model has been developed it is crucial that the generalisability of the model is appropriately evaluated minimising any optimism in the initial results. Contributing to this optimism are inadequate sample size of the validation cohort, inappropriate handling of missing data, the categorisation of continuous risk factors and the treatment of nonlinear risk factors. While these issues are quite well recognised in studies to develop risk prediction models, to date their importance has not been appreciated in relation to studies that evaluate model performance on data sets not used to derive the model (validation studies). We aim to evaluate the impact and provide guidance on the choice of sample size of the validation cohort on the performance of the risk prediction model. We will assess the impact of missing data in both the derivation and validation cohorts on the performance of the risk prediction model. Finally we will address the treatment and handling of continuous risk factors and evaluate the effect this has on the generalisability of the risk prediction model. The overall goal is to inform guidance on the design and conduct of reliable validation studies for future risk prediction models.