Sample Size calculations for UPDATing clinical prediction models to Ensure their accuracy and fairness in practice (SS-UPDATE)
Lead Research Organisation:
University of Birmingham
Department Name: Institute of Applied Health Research
Abstract
Healthcare research is in an exciting phase, with increasing access to information to link an individual's characteristics (such as age, family history or genetic information) with health outcomes (such as death, pain level, cancer). Researchers are using this information to help health professionals accurately predict an individual's future outcomes, to better personalise treatment, improve quality of life, and prolong life. For example, QRISK is used by doctors to calculate an individual's risk of heart disease within the next 10 years, and to guide who needs treatment to reduce their risk of heart disease occurring. Such prediction tools are known as 'clinical prediction models', and thousands are developed each year using statistical and artificial intelligence (AI) approaches.
Once a prediction model like QRISK has entered into clinical practice, it is important that it is regularly updated (e.g. yearly) as otherwise its accuracy wanes over time. For example, due to changes in treatments available, the co-morbidities (multiple health conditions) of patients, and emerging global problems (e.g. pandemics), an outdated model may wrongly predict a low risk for a truly high risk individual, or vice-verse, and so model updating is needed to recalibrate predictions. Similarly, a model often needs updating when transporting it from the original setting (e.g. USA, secondary care) to a new one (e.g. UK, primary care), or when aiming to improve a model's accuracy (and thus fairness) in subgroups defined by sex and ethnicity.
The reliability, accuracy and fairness of an updated prediction model depends heavily on the representativeness and sample size of the dataset used to update the model. However, there is currently no clear guidance for how researchers should identify the (minimum) sample size required for model updating - for example, how many participants and outcome events are needed, relative to the number of model parameters being estimated (updated)? Sadly, many updating datasets are too small, and this leads to updated models with inaccurate and potentially harmful predictions. Therefore, identifying a suitable sample size is vital for researchers to consider at the outset of model updating studies.
To address this, our project aims to provide guidance and methods for calculating the (minimum) sample size required to update a prediction model to ensure it is reliable, accurate and fair. We will achieve this using a series of work packages that: (i) review applied and methodology papers using (or proposing) a model updating method, to identify current approaches and shortcomings; (ii) develop sample size guidance and solutions (mathematical formulae) for a range of model updating methods for continuous, binary or time-to-event outcomes; and (iii) extend calculations to address model updates for subgroups (e.g. ethnic groups) to ensure models are generalisable and fair. All our work will be underpinned by real applications and disseminated through freely-available computer software, web apps, dedicated workshops (with researchers and patient groups), training courses, social media and tutorial videos.
Our findings will provide quality standards for researchers to adhere to when updating models, and allow funders, health professionals and regulators to identify updated models that are reliable and fair for use in patient counselling and decision making. This aligns with "Good Machine Learning Practice for Medical Device Development: Guiding Principles" issued by US Food and Drug Administration, Health Canada and UK Medicines and Healthcare Products Regulatory Agency in 2021 to produce safe, effective and ethical models.
Once a prediction model like QRISK has entered into clinical practice, it is important that it is regularly updated (e.g. yearly) as otherwise its accuracy wanes over time. For example, due to changes in treatments available, the co-morbidities (multiple health conditions) of patients, and emerging global problems (e.g. pandemics), an outdated model may wrongly predict a low risk for a truly high risk individual, or vice-verse, and so model updating is needed to recalibrate predictions. Similarly, a model often needs updating when transporting it from the original setting (e.g. USA, secondary care) to a new one (e.g. UK, primary care), or when aiming to improve a model's accuracy (and thus fairness) in subgroups defined by sex and ethnicity.
The reliability, accuracy and fairness of an updated prediction model depends heavily on the representativeness and sample size of the dataset used to update the model. However, there is currently no clear guidance for how researchers should identify the (minimum) sample size required for model updating - for example, how many participants and outcome events are needed, relative to the number of model parameters being estimated (updated)? Sadly, many updating datasets are too small, and this leads to updated models with inaccurate and potentially harmful predictions. Therefore, identifying a suitable sample size is vital for researchers to consider at the outset of model updating studies.
To address this, our project aims to provide guidance and methods for calculating the (minimum) sample size required to update a prediction model to ensure it is reliable, accurate and fair. We will achieve this using a series of work packages that: (i) review applied and methodology papers using (or proposing) a model updating method, to identify current approaches and shortcomings; (ii) develop sample size guidance and solutions (mathematical formulae) for a range of model updating methods for continuous, binary or time-to-event outcomes; and (iii) extend calculations to address model updates for subgroups (e.g. ethnic groups) to ensure models are generalisable and fair. All our work will be underpinned by real applications and disseminated through freely-available computer software, web apps, dedicated workshops (with researchers and patient groups), training courses, social media and tutorial videos.
Our findings will provide quality standards for researchers to adhere to when updating models, and allow funders, health professionals and regulators to identify updated models that are reliable and fair for use in patient counselling and decision making. This aligns with "Good Machine Learning Practice for Medical Device Development: Guiding Principles" issued by US Food and Drug Administration, Health Canada and UK Medicines and Healthcare Products Regulatory Agency in 2021 to produce safe, effective and ethical models.
Publications
| Title | TRIPOD+AI |
| Description | TRIPOD+AI provides harmonised guidance for reporting prediction model studies, irrespective of whether regression modelling or machine learning methods have been used. The new checklist supersedes the TRIPOD 2015 checklist, which should no longer be used. I was part of the leadership group that developed TRIPOD+AI and our BMJ paper presents the expanded 27 item checklist with more detailed explanation of each reporting recommendation, and the TRIPOD+AI for Abstracts checklist. TRIPOD+AI aims to promote the complete, accurate, and transparent reporting of studies that develop a prediction model or evaluate its performance. Complete reporting will facilitate study appraisal, model evaluation, and model implementation. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | Recommended by EQUATOR and leading medical journals, the TRIPOD+AI has already been cited 400 times since publication last year and will help improve the quality of reporting in prediction model research. |
| URL | https://www.bmj.com/content/385/bmj-2023-078378 |
| Description | NICE guidance document on clinical prediction models |
| Organisation | National Institute for Health and Care Excellence (NICE) |
| Department | NICE International |
| Country | United Kingdom |
| Sector | Public |
| PI Contribution | Working on a NICE guidance document for how to develop, validate and appraise clinical prediction models |
| Collaborator Contribution | Richard Riley has co-written the guidance |
| Impact | Guidance paper is forthcoming |
| Start Year | 2024 |
| Title | pmsampsize - software module in Python for sample size required for model development stuies |
| Description | Calculate the sample size for developing a prediction model |
| Type Of Technology | Software |
| Year Produced | 2024 |
| Impact | Researchers are using it to design their model development studies |
| Title | pmsampsize: Software in Stata for calculating the sample size required for development of prediction models |
| Description | Our package (led by Joie Ensor) for calculating the sample size needed to develop a prediction model is constantly updated as new guidance emerges from our grants and other work |
| Type Of Technology | Software |
| Year Produced | 2024 |
| Impact | Software has been downloaded >40k times since launch in 2020, and is constantly updated during our grants |
| URL | https://ideas.repec.org/c/boc/bocode/s458569.html |
| Title | pmsampsize: Software in Stata for calculating the sample size required for development of prediction models |
| Description | Our package (led by Joie Ensor) for calculating the sample size needed to develop a prediction model is constantly updated as new guidance emerges from our grants and other work |
| Type Of Technology | Software |
| Year Produced | 2024 |
| Impact | Researchers routinely use this package to inform the sample size for their model development studies - package downloaded >40k times |
| URL | https://cran.r-project.org/web/packages/pmsampsize/index.html |
| Title | pmstabilityss - Stata module for calculating the sample size required for individual-level stability in risk predictions |
| Description | Open source in public domain - calculates the sample size required for individual-level stability in risk predictions |
| Type Of Technology | Software |
| Year Produced | 2025 |
| Impact | We anticipate researchers will use this to inform the design of their model development and updating studies |
| Title | pmstabilitytte - Stata module for calculating the sample size needed for precise predictions from a time-to-event prediction model |
| Description | In public domain - the package calculates the individual uncertainty anticipated when developing or updating a model with a particular sample size |
| Type Of Technology | Software |
| Year Produced | 2025 |
| Impact | Researchers will use this to inform the sample size needed for developing or updating their models |
| Title | pmvalsampsize - software module in Python for sample size required for model validation studies (2024) |
| Description | Calculate the sample size required for evaluating model and their performance, in the package Python - we are constantly updating and extending this |
| Type Of Technology | Software |
| Year Produced | 2024 |
| Impact | Researchers are using this to calculate the sample size for their model validation studies |
| Title | pmvalsampsize: Software in Stata for calculating the sample size required for validation of prediction models |
| Description | This calculates the sample size needed for studies evaluating a model, which is important part of risk of bias assessments in reviews of models, or for those planning studies to evaluate models |
| Type Of Technology | Software |
| Year Produced | 2023 |
| Impact | Researchers are using this to inform their sample size for model evaluation - we are constantly updating and extending this |
| URL | https://ideas.repec.org/c/boc/bocode/s459226.html#:~:text=pmvalsampsize%20computes%20the%20minimum%2... |
| Description | 3-day training course: Statistical Methods for Risk Prediction & Prognostic Models (2024 - delivered twice) |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Short course to disseminate research methods to international participants, about statistical methods for risk prediction modelling, including topics such as sample size, model development, critical appraisal, model evaluation, meta-analysis, etc |
| Year(s) Of Engagement Activity | 2024 |
| Description | Invited Oral Presentation: Prediction models for healthcare: an introduction (Tata Memorial Centre, Mumbai, India) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Seminar to discuss methodological issues in prediction model research, and our sample size guidance. Attendees gained new knowledge and insights to improve their studies going forwards |
| Year(s) Of Engagement Activity | 2024 |
| Description | Invited Oral Presentation: Clinical prediction models: a playground for healthcare research (Cardiff) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Other audiences |
| Results and Impact | Seminar to discuss methodological issues in prediction model research, and our sample size guidance. Attendees gained new knowledge and insights to improve their studies going forwards |
| Year(s) Of Engagement Activity | 2024 |
| Description | Invited seminar: Size Matters: The importance of sample size on the quality and utility of AI-based prediction models for healthcare (University of Liverpool) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Postgraduate students |
| Results and Impact | Seminar to discuss methodological issues in prediction model research, and our sample size guidance. Attendees gained new knowledge and insights to improve their understanding of research in prediction for their career |
| Year(s) Of Engagement Activity | 2024,2025 |
| Description | MEMTAB 2025 Conference (Hosting, Organising and Delivering) |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | MEMTAB is the leading international conference about methods to evaluate models, tests & biomarkers for healthcare. It allows debate & dissemination of best methods for developing, evaluating & identifying reliable models, tests & biomarkers for use in clinical practice. In 2025, we organised the 7th International conference, held at the University of Birmingham and raised the conference theme: "Methodology That Stands the Test". We pushed participants to understanding of what constitutes the research evidence needed for models, tests and biomarkers to be reliably endorsed, communicated and deployed in practice. We had 200 participants from around the world, including PhD students, Clinical Fellows, Methodologists, GPs and healthcare professionals, regulators, and economists, and included sessions on sample size, systematic reviews and PPIE involvement in methodology research for prediction models. Participants were given methodology knowledge that changes they way they will do their research in practice and how to evaluate and regulate models and tests in practice. |
| Year(s) Of Engagement Activity | 2025 |
| URL | https://uobevents.eventsair.com/memtab-2025/ |
| Description | MEMTAB 2025 Short course delivery: An Introduction to Risk Prediction Models and Sample Size Calculations |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | We delivered a 1-day short course introducing the key phases of clinical prediction models, and the theory and software for sample size calculations for development, updating and evaluation, to 30 participants attending the MEMTAB conference. Participants learnt the tools and approaches to improve their research design and analyses moving forwards |
| Year(s) Of Engagement Activity | 2025 |
| Description | Oral presentation: Sample size calculations for accuracy-based measures (ISCB 2024, Greece) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Session on predition model research at the ISCB methodology conference |
| Year(s) Of Engagement Activity | 2024 |
| Description | Oral presentation: Sample size for targeting precise individual-level risk estimates for binary outcomes (Royal Statistical Society, Brighton) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Presention within a prediction model session at the Royal Statistical Society conference in Brighton, Sept 2024 |
| Year(s) Of Engagement Activity | 2024 |
| Description | PPIE group for prediction model methodology |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Patients, carers and/or patient groups |
| Results and Impact | We have facilitated a new PPIE group focused on supporting methodology research for clinical prediction models. This group have provided input toward existing projects on uncertainty and sample size, and will contribute to new research discussions and outputs from our methodology work going forward. Led by Kym Snell and Paula Dhiman, the group have met in the early evening online, to have open discussions about expectations and roles, and to learn about our work and for them both sides to identify how they can contribute effectively. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Prognosis Research in Healthcare Summer School |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Our summer school was attended by 18 participants and we taught about research methods for primary studies and systematic reviews of prognosis research including prediction models. Participants were given methodology knowledge that changes they way they will do their research in practice |
| Year(s) Of Engagement Activity | 2024 |
| Description | Young Statisticians Meeting - Workshop on Sample Size for Prediction Models (organisation and delivery) |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Other audiences |
| Results and Impact | 40 participants (all career-young statisticians and data scientists) came to learn about sample size calculations for risk prediction modelling, which will change how they do their research going forward |
| Year(s) Of Engagement Activity | 2024 |
| Description | invited oral presentation (February, 2025): Harnessing uncertainty in clinical prediction models using Stata |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | >350 participants worldwide attending Joie Ensor's invited talk on prediction model research, sample size and uncertainty ... which led to dissemination of our new software modules |
| Year(s) Of Engagement Activity | 2025 |
