📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Sample Size guidance for developing and validating reliable and fair AI PREDICTion models in healthcare (SS-PREDICT)

Lead Research Organisation: University of Birmingham
Department Name: Institute of Applied Health Research

Abstract

Healthcare research is in an exciting new phase, with increasing access to information to link an individual's characteristics (such as age, family history or genetic information) with health outcomes (such as death, pain level, depression score). Researchers are using this information alongside artificial intelligence (AI) methods to help health professionals and patients predict future outcomes, to better personalise treatment, improve quality of life, and to prolong life. For example, QRISK is used by doctors to calculate an individual's risk of heart disease within the next 10 years, and to guide who needs a treatment (e.g., statins) to reduce their risk of heart disease occurring. Such prediction tools are known as 'clinical prediction models' or 'clinical prediction algorithms'.

Thousands of clinical prediction model studies are published each year, but unfortunately most are not fit for purpose because they give inaccurate predictions. For example, some individuals predicted low risk may actually be high risk of adverse outcomes, and vice versa. Such inaccurate prediction models may lead to harm and represents research waste, where money spent on research does not lead to improvements in healthcare or patient outcomes. A major reason for poor prediction models is that they are developed using a sample of data that is too small, for example in terms of the total number of participants contributing data and the number of outcome events (e.g., deaths) observed therein.

To address this, in this project we will provide guidance and new methods to enable researchers to calculate the sample size needed to develop a reliable prediction model, for their particular health condition (e.g., heart disease), prediction outcome (e.g., death) and setting (e.g., general practice) of interest. With the new guidance our project provides, researchers will know how large their data needs to be to reliably develop an AI prediction model and precisely demonstrate its accuracy in the population (e.g. UK) and key subgroups (e.g. different ethnic groups).

The project will be split into two topic areas: (i) sample size for model development, and (ii) sample size for testing a model's accuracy, also known as model evaluation or model validation. In the first, we will focus on the accuracy of different AI approaches to developing a model, including statistical methods and so-called 'machine learning' approaches, and tailor sample size guidance for each approach using mathematical and computer-based results. In the second, we will focus on testing (evaluating) the accuracy of an AI model and derive mathematical solutions that calculate the sample size needed to precisely estimate a range of accuracy measures relevant to researchers, health professionals and patients, both in the overall population and in subgroups where fairness checks are essential.

The project findings will accelerate the production and identification of reliable and fair AI prediction models for use in healthcare. It will also provide quality standards for researchers to adhere to when developing and validating new AI models, and allow regulators (those deciding what models should be used and how) to distinguish between models that are reliable and fair, and models that are unreliable and potentially harmful.

The work will be disseminated through computer software and web apps; publications in academic journals; dedicated workshops (with AI researchers and patient groups) and training courses to educate those working in or using prediction model research; and blogs, social media and tutorial videos at websites including www.prognosisresearch.com and YouTube to target the international academic community and a broad audience including patients and the public.

Publications

10 25 50

 
Description We have developed a series of research outputs, to help other researchers working in data science and artificial intelligence (AI) to:
(i) understand why sample size is an important consideration when designing and analysing studies to develop or evaluate a prediction tool in healthcare
(ii) calculate the sample size needed to develop a tool for predicting risks of adverse outcomes in healthcare
(iii) calculate the sample size needed to evaluate a tool for predicting risks of adverse outcomes in healthcare
(iv) derive uncertainty intervals for risk predictions, to reveal the strength of evidence (quality) behind a model's predictions, and to check fairness of models across subgroups of patients (e.g. different ethnic groups)
(v) implement calculations in statistical software including Stata, R and Python, by using our modules pmsampsiz, pmvalsampsize and pmstabilityss

We have 10 papers published that stem directly or indirectly from our work, and three packages that are regularly maintained and updated across three software platforms.

In terms of engagement, we have:
(i) spoken with patients and clinical teams, to gain feedback and endorsement of our work and recommendations - and example of PPIE contribution is in our published BMJ paper https://www.bmj.com/content/388/bmj-2024-080749
(ii) disseminated our work to other researchers via training courses that have attracted >200 participants in the last 18 months;
(iii) educated others via national and international talks; and videos and blogs
(iv) set-up a patient and public engagement group to contribute toward methodology research in prediction modelling.
(v) organised an international conference called MEMTAB, to showcase our work (and those of others) to the international research community (200 participants)

Many further methods, project outputs and implementation strategies are forthcoming in the next months and year.
Exploitation Route Researchers developing and evaluating clinical prediction models can use our tools to help calculate the sample size required to do their research, whilst patients and clinical stakeholders and regulators can use our proposed uncertainty intervals to help decide whether a model should be used in practice. We are working to embed some of the calculations within the e-health data system, e.g. at the data request stage, potentially informed by generation of synthetic data to mirror that real data in the system. Many further methods, project outputs and implementation strategies are forthcoming in the next months and year: with exciting new approaches to showcase.
Sectors Digital/Communication/Information Technologies (including Software)

Healthcare

Pharmaceuticals and Medical Biotechnology

 
Description Our sample size work has focused substantially on the need to consider and quantify uncertainty of risk estimates. This has led to software package pmstabilityss. Morover, after model development, our proposed uncertainty intervals (instability intervals) will help patients and clinical stakeholders / regulators to understand whether a model is fit for using in clinical practice to guide individual decision making. This has been outlined with PPIE groups in our recent BMJ paper: Further, we are working to promote the implementation of our sample size calculations and subsequent uncertainty intervals (which reflect sample size) in systems that automatically produce models and risk predictions in individuals
First Year Of Impact 2024
Sector Healthcare
 
Title TRIPOD+AI 
Description TRIPOD+AI provides harmonised guidance for reporting prediction model studies, irrespective of whether regression modelling or machine learning methods have been used. The new checklist supersedes the TRIPOD 2015 checklist, which should no longer be used. I was part of the leadership group that developed TRIPOD+AI and our BMJ paper presents the expanded 27 item checklist with more detailed explanation of each reporting recommendation, and the TRIPOD+AI for Abstracts checklist. TRIPOD+AI aims to promote the complete, accurate, and transparent reporting of studies that develop a prediction model or evaluate its performance. Complete reporting will facilitate study appraisal, model evaluation, and model implementation. 
Type Of Material Improvements to research infrastructure 
Year Produced 2024 
Provided To Others? Yes  
Impact Recommended by EQUATOR and leading medical journals, the TRIPOD+AI has already been cited 400 times since publication last year and will help improve the quality of reporting in prediction model research. 
URL https://www.bmj.com/content/385/bmj-2023-078378
 
Description Cochrane Prognosis Methods Group 
Organisation The Cochrane Collaboration
Country Global 
Sector Charity/Non Profit 
PI Contribution We have developed methods and guidance to support those working in the Cochrane collaboration on prognosis reviews, and are developing the Cochrane Handbook for Prognosis Reviews with them
Collaborator Contribution They have provided feedback, guidance and methods to inform and extend the work in our grants, e.g. contributing to chapters in the Cochrane Prognosis Reviews Handbook and to tools like PROBAST+AI for critical appraisal of prediction models
Impact 1. Snell KIE, Levis B, Damen JAA, Dhiman P, Debray TPA, Hooft L, et al. Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (TRIPOD-SRMA). BMJ. 2023;381:e073538. 2. Hudda MT, Archer L, van Smeden M, Moons KGM, Collins GS, Steyerberg EW, et al. Minimal reporting improvement after peer review in reports of COVID-19 prediction models: systematic review. J Clin Epidemiol. 2023;154:75-84. 3. Levis B, Snell KIE, Damen JAA, Hattle M, Ensor J, Dhiman P, et al. Risk of bias assessments in individual participant data meta-analyses of test accuracy and prediction models: a review shows improvements are needed. J Clin Epidemiol. 2024;165:111206.
Start Year 2008
 
Description NICE guidance document on clinical prediction models 
Organisation National Institute for Health and Care Excellence (NICE)
Department NICE International
Country United Kingdom 
Sector Public 
PI Contribution Working on a NICE guidance document for how to develop, validate and appraise clinical prediction models
Collaborator Contribution Richard Riley has co-written the guidance
Impact Guidance paper is forthcoming
Start Year 2024
 
Title pmsampsize - software module in Python for sample size required for model development stuies 
Description Calculate the sample size for developing a prediction model 
Type Of Technology Software 
Year Produced 2024 
Impact Researchers are using it to design their model development studies 
 
Title pmsampsize: Software in Stata for calculating the sample size required for development of prediction models 
Description Our package (led by Joie Ensor) for calculating the sample size needed to develop a prediction model is constantly updated as new guidance emerges from our grants and other work 
Type Of Technology Software 
Year Produced 2024 
Impact Software has been downloaded >40k times since launch in 2020, and is constantly updated during our grants 
URL https://ideas.repec.org/c/boc/bocode/s458569.html
 
Title pmsampsize: Software in Stata for calculating the sample size required for development of prediction models 
Description Our package (led by Joie Ensor) for calculating the sample size needed to develop a prediction model is constantly updated as new guidance emerges from our grants and other work 
Type Of Technology Software 
Year Produced 2024 
Impact Researchers routinely use this package to inform the sample size for their model development studies - package downloaded >40k times 
URL https://cran.r-project.org/web/packages/pmsampsize/index.html
 
Title pmstabilityss - Stata module for calculating the sample size required for individual-level stability in risk predictions 
Description Open source in public domain - calculates the sample size required for individual-level stability in risk predictions 
Type Of Technology Software 
Year Produced 2025 
Impact We anticipate researchers will use this to inform the design of their model development and updating studies 
 
Title pmstabilitytte - Stata module for calculating the sample size needed for precise predictions from a time-to-event prediction model 
Description In public domain - the package calculates the individual uncertainty anticipated when developing or updating a model with a particular sample size 
Type Of Technology Software 
Year Produced 2025 
Impact Researchers will use this to inform the sample size needed for developing or updating their models 
 
Title pmvalsampsize - software module in Python for sample size required for model validation studies (2024) 
Description Calculate the sample size required for evaluating model and their performance, in the package Python - we are constantly updating and extending this 
Type Of Technology Software 
Year Produced 2024 
Impact Researchers are using this to calculate the sample size for their model validation studies 
 
Title pmvalsampsize: Software in R for calculating the sample size needed to validate a prediction model 
Description This packages calculate the sample size needed to evaluate a prediction model precisely, which is important for those examining a studies risk of bias, or for those designing a new study 
Type Of Technology Software 
Year Produced 2024 
Impact only just released 
URL https://cran.r-project.org/web/packages/pmvalsampsize/index.html
 
Title pmvalsampsize: Software in Stata for calculating the sample size required for validation of prediction models 
Description This calculates the sample size needed for studies evaluating a model, which is important part of risk of bias assessments in reviews of models, or for those planning studies to evaluate models 
Type Of Technology Software 
Year Produced 2023 
Impact Researchers are using this to inform their sample size for model evaluation - we are constantly updating and extending this 
URL https://ideas.repec.org/c/boc/bocode/s459226.html#:~:text=pmvalsampsize%20computes%20the%20minimum%2...
 
Description 3-day training course: Statistical Methods for Risk Prediction & Prognostic Models 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We run a 3-day training course to educate researchers on how to use statistical methods to develop and evaluate clinical prediction models, covering a huge array of topics and software practicals - this is attended by about 50 people each time, and we run about 3 times per year
Year(s) Of Engagement Activity 2022,2023,2024
 
Description 3-day training course: Statistical Methods for Risk Prediction & Prognostic Models (2024 - delivered twice) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Short course to disseminate research methods to international participants, about statistical methods for risk prediction modelling, including topics such as sample size, model development, critical appraisal, model evaluation, meta-analysis, etc
Year(s) Of Engagement Activity 2024
 
Description Invited Oral Presentation: Prediction models for healthcare: an introduction (Tata Memorial Centre, Mumbai, India) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Seminar to discuss methodological issues in prediction model research, and our sample size guidance. Attendees gained new knowledge and insights to improve their studies going forwards
Year(s) Of Engagement Activity 2024
 
Description Invited Oral Presentation: Clinical prediction models: a playground for healthcare research (Cardiff) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact Seminar to discuss methodological issues in prediction model research, and our sample size guidance. Attendees gained new knowledge and insights to improve their studies going forwards
Year(s) Of Engagement Activity 2024
 
Description Invited Oral Presentation: Clinical prediction models: a playground for healthcare research (QuanTim Seminar for the SESSTIM Research Unit, Marseille, France) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact I gave an invited talk about the current challenges in prediction model research, and in particular the importance of sample size considerations in AI-based model studies
Year(s) Of Engagement Activity 2024
 
Description Invited seminar: Size Matters: The importance of sample size on the quality and utility of AI-based prediction models for healthcare (University of Liverpool) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Seminar to discuss methodological issues in prediction model research, and our sample size guidance. Attendees gained new knowledge and insights to improve their understanding of research in prediction for their career
Year(s) Of Engagement Activity 2024,2025
 
Description MEMTAB 2025 Conference (Hosting, Organising and Delivering) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact MEMTAB is the leading international conference about methods to evaluate models, tests & biomarkers for healthcare. It allows debate & dissemination of best methods for developing, evaluating & identifying reliable models, tests & biomarkers for use in clinical practice.

In 2025, we organised the 7th International conference, held at the University of Birmingham and raised the conference theme: "Methodology That Stands the Test". We pushed participants to understanding of what constitutes the research evidence needed for models, tests and biomarkers to be reliably endorsed, communicated and deployed in practice. We had 200 participants from around the world, including PhD students, Clinical Fellows, Methodologists, GPs and healthcare professionals, regulators, and economists, and included sessions on sample size, systematic reviews and PPIE involvement in methodology research for prediction models. Participants were given methodology knowledge that changes they way they will do their research in practice and how to evaluate and regulate models and tests in practice.
Year(s) Of Engagement Activity 2025
URL https://uobevents.eventsair.com/memtab-2025/
 
Description MEMTAB 2025 Short course delivery: An Introduction to Risk Prediction Models and Sample Size Calculations 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact We delivered a 1-day short course introducing the key phases of clinical prediction models, and the theory and software for sample size calculations for development, updating and evaluation, to 30 participants attending the MEMTAB conference. Participants learnt the tools and approaches to improve their research design and analyses moving forwards
Year(s) Of Engagement Activity 2025
 
Description Oral presentation: Sample size calculations for accuracy-based measures (ISCB 2024, Greece) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Session on predition model research at the ISCB methodology conference
Year(s) Of Engagement Activity 2024
 
Description Oral presentation: Sample size for targeting precise individual-level risk estimates for binary outcomes (Royal Statistical Society, Brighton) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Presention within a prediction model session at the Royal Statistical Society conference in Brighton, Sept 2024
Year(s) Of Engagement Activity 2024
 
Description PPIE group for prediction model methodology 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact We have facilitated a new PPIE group focused on supporting methodology research for clinical prediction models. This group have provided input toward existing projects on uncertainty and sample size, and will contribute to new research discussions and outputs from our methodology work going forward. Led by Kym Snell and Paula Dhiman, the group have met in the early evening online, to have open discussions about expectations and roles, and to learn about our work and for them both sides to identify how they can contribute effectively.
Year(s) Of Engagement Activity 2024
 
Description Prognosis Research in Healthcare Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Each year this summer school disseminates best practice in undertaking primary studies and reviews of prognosis research, to a broad clinical and methodological audience from academia and industry, with participants from around the world
Year(s) Of Engagement Activity 2021,2022,2023
 
Description Prognosis Research in Healthcare Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Our summer school was attended by 18 participants and we taught about research methods for primary studies and systematic reviews of prognosis research including prediction models. Participants were given methodology knowledge that changes they way they will do their research in practice
Year(s) Of Engagement Activity 2024
 
Description Young Statisticians Meeting - Workshop on Sample Size for Prediction Models (organisation and delivery) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact 40 participants (all career-young statisticians and data scientists) came to learn about sample size calculations for risk prediction modelling, which will change how they do their research going forward
Year(s) Of Engagement Activity 2024
 
Description invited oral presentation (Aberdeen, RSS local meeting, October 2023): Clinical prediction models: a playground for healthcare research 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact I spoke about 'Clinical prediction models: a playground for healthcare research (Aberdeen, RSS local meeting, invited in-person)', to disseminate our work on instability and the need for better sample sizes, to improve reviews and meta-analyses
Year(s) Of Engagement Activity 2023
 
Description invited oral presentation (February, 2025): Harnessing uncertainty in clinical prediction models using Stata 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact >350 participants worldwide attending Joie Ensor's invited talk on prediction model research, sample size and uncertainty ... which led to dissemination of our new software modules
Year(s) Of Engagement Activity 2025
 
Description invited oral presentation (Utrecht, October 2023): Stability of clinical prediction models developed using statistical or machine learning methods 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact I gave a talk to disseminate our work on stability of clinical prediction models developed using statistical or machine learning methods
Year(s) Of Engagement Activity 2023