Sample Size guidance for developing and validating reliable and fair AI PREDICTion models in healthcare (SS-PREDICT)

Lead Research Organisation: University of Birmingham
Department Name: Institute of Applied Health Research

Abstract

Healthcare research is in an exciting new phase, with increasing access to information to link an individual's characteristics (such as age, family history or genetic information) with health outcomes (such as death, pain level, depression score). Researchers are using this information alongside artificial intelligence (AI) methods to help health professionals and patients predict future outcomes, to better personalise treatment, improve quality of life, and to prolong life. For example, QRISK is used by doctors to calculate an individual's risk of heart disease within the next 10 years, and to guide who needs a treatment (e.g., statins) to reduce their risk of heart disease occurring. Such prediction tools are known as 'clinical prediction models' or 'clinical prediction algorithms'.

Thousands of clinical prediction model studies are published each year, but unfortunately most are not fit for purpose because they give inaccurate predictions. For example, some individuals predicted low risk may actually be high risk of adverse outcomes, and vice versa. Such inaccurate prediction models may lead to harm and represents research waste, where money spent on research does not lead to improvements in healthcare or patient outcomes. A major reason for poor prediction models is that they are developed using a sample of data that is too small, for example in terms of the total number of participants contributing data and the number of outcome events (e.g., deaths) observed therein.

To address this, in this project we will provide guidance and new methods to enable researchers to calculate the sample size needed to develop a reliable prediction model, for their particular health condition (e.g., heart disease), prediction outcome (e.g., death) and setting (e.g., general practice) of interest. With the new guidance our project provides, researchers will know how large their data needs to be to reliably develop an AI prediction model and precisely demonstrate its accuracy in the population (e.g. UK) and key subgroups (e.g. different ethnic groups).

The project will be split into two topic areas: (i) sample size for model development, and (ii) sample size for testing a model's accuracy, also known as model evaluation or model validation. In the first, we will focus on the accuracy of different AI approaches to developing a model, including statistical methods and so-called 'machine learning' approaches, and tailor sample size guidance for each approach using mathematical and computer-based results. In the second, we will focus on testing (evaluating) the accuracy of an AI model and derive mathematical solutions that calculate the sample size needed to precisely estimate a range of accuracy measures relevant to researchers, health professionals and patients, both in the overall population and in subgroups where fairness checks are essential.

The project findings will accelerate the production and identification of reliable and fair AI prediction models for use in healthcare. It will also provide quality standards for researchers to adhere to when developing and validating new AI models, and allow regulators (those deciding what models should be used and how) to distinguish between models that are reliable and fair, and models that are unreliable and potentially harmful.

The work will be disseminated through computer software and web apps; publications in academic journals; dedicated workshops (with AI researchers and patient groups) and training courses to educate those working in or using prediction model research; and blogs, social media and tutorial videos at websites including www.prognosisresearch.com and YouTube to target the international academic community and a broad audience including patients and the public.
 
Description Our research project began 6 months ago, and there are still 12 months remaining.
In this first 6 months we have developed a series of research outputs, to help other researchers working in artificial intelligence (AI) to:
(i) understand why sample size is an important consideration when designing and analysing studies to develop or evaluate a prediction tool in healthcare
(ii) calculate the sample size needed to develop a tool for predicting risks of adverse outcomes in healthcare
(iii) derive uncertainty intervals for risk predictions, to reveal the strength of evidence (quality) behind a model's predictions
(iv) implement calculations in statistical software including Stata and R, by using our modules pmsampsize and pmvalsampsize

In terms of engagement, we have:
(i) spoken with patients and clinical teams, to gain feedback and endorsement of our work and recommendations
(ii) disseminated our work to other researchers via training courses; national and international talks; and only videos and blogs
Exploitation Route Researchers developing and evaluating clinical prediction models can use our tools to help calculate the sample size required to do their research, whilst patients and clinical stakeholders and regulators can use our proposed uncertainty intervals to help decide whether a model should be used in practice
Sectors Digital/Communication/Information Technologies (including Software)

Healthcare

Pharmaceuticals and Medical Biotechnology

 
Description Our proposed uncertainty intervals (instability intervals) will help patients and clinical stakeholders / regulators to understand whether a model is fit for using in clinical practice to guide individual decision making Further, we are working to promote the implementation of uncertainty intervals (which reflect sample size) in system that automatically produce risk predictions in individuals
First Year Of Impact 2024
Sector Healthcare
 
Title Software in Stata for calculating the sample size required for validation of prediction models 
Description This calculates the sample size needed for studies evaluating a model, which is important part of risk of bias assessments in reviews of models, or for those planning studies to evaluate models 
Type Of Technology Software 
Year Produced 2023 
Impact Only just released 
URL https://ideas.repec.org/c/boc/bocode/s459226.html#:~:text=pmvalsampsize%20computes%20the%20minimum%2...
 
Title pmvalsampsize: Software in R for calculating the sample size needed to validate a prediction model 
Description This packages calculate the sample size needed to evaluate a prediction model precisely, which is important for those examining a studies risk of bias, or for those designing a new study 
Type Of Technology Software 
Year Produced 2024 
Impact only just released 
URL https://cran.r-project.org/web/packages/pmvalsampsize/index.html
 
Description 3-day training course: Statistical Methods for Risk Prediction & Prognostic Models 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We run a 3-day training course to educate researchers on how to use statistical methods to develop and evaluate clinical prediction models, covering a huge array of topics and software practicals - this is attended by about 50 people each time, and we run about 3 times per year
Year(s) Of Engagement Activity 2022,2023,2024
 
Description Invited Oral Presentation: Clinical prediction models: a playground for healthcare research (QuanTim Seminar for the SESSTIM Research Unit, Marseille, France) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact I gave an invited talk about the current challenges in prediction model research, and in particular the importance of sample size considerations in AI-based model studies
Year(s) Of Engagement Activity 2024
 
Description Prognosis Research in Healthcare Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Each year this summer school disseminates best practice in undertaking primary studies and reviews of prognosis research, to a broad clinical and methodological audience from academia and industry, with participants from around the world
Year(s) Of Engagement Activity 2021,2022,2023
 
Description invited oral presentation (Aberdeen, RSS local meeting, October 2023): Clinical prediction models: a playground for healthcare research 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact I spoke about 'Clinical prediction models: a playground for healthcare research (Aberdeen, RSS local meeting, invited in-person)', to disseminate our work on instability and the need for better sample sizes, to improve reviews and meta-analyses
Year(s) Of Engagement Activity 2023
 
Description invited oral presentation (Utrecht, October 2023): Stability of clinical prediction models developed using statistical or machine learning methods 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact I gave a talk to disseminate our work on stability of clinical prediction models developed using statistical or machine learning methods
Year(s) Of Engagement Activity 2023