2023
DOI: 10.1186/s12874-023-02008-1
|View full text |Cite
|
Sign up to set email alerts
|

Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review

Paula Dhiman,
Jie Ma,
Cathy Qi
et al.

Abstract: Background Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. Methods We searched PubMed for studies published between 01/07/2020 and 30/07/2020 and reviewed the sample size calculations used to develop the prediction models. Using the available information, we calculated the minimum sample size th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…Model overfitting, a common source of bias, occurs when the modelling process captures idiosyncratic random variation in the development dataset that is not reflective of the true patient population. Small sample sizes, grouping of continuous predictors, and univariable predictor selection can all introduce bias through overfitting, leading to optimistic performance estimates and exaggerated risk predictions (7,25,44).…”
Section: Discussionmentioning
confidence: 99%
“…Model overfitting, a common source of bias, occurs when the modelling process captures idiosyncratic random variation in the development dataset that is not reflective of the true patient population. Small sample sizes, grouping of continuous predictors, and univariable predictor selection can all introduce bias through overfitting, leading to optimistic performance estimates and exaggerated risk predictions (7,25,44).…”
Section: Discussionmentioning
confidence: 99%
“…In view of this, a post hoc analysis of sample size is performed, which finds that although the current sample size does not fully meet the criteria proposed by Riley et al ( n = 289; events per predictor parameter [EPP]: 14.45) [ 54 ], the EPP of the current model reaches 12.512. Furthermore, instability plot based on bootstrap model ( b = 500 times) demonstrates that the existing model is relatively stable (mean absolute percentage error: 0.0585) [ 55 , 56 ]. Additionally, the definition of the outcome in this study is not sufficiently objective.…”
Section: Discussionmentioning
confidence: 99%
“…Instead, the adequacy of the sample size is usually implicitly evaluated in relation to the model performance. However, recent research has demonstrated that many datasets used to build and evaluate prediction models have been undersized, increasing the risk of bias (Wynants et al, 2020), and potentially giving rise to inaccurate conclusions (Dhiman et al, 2023). Thus, more focus could go towards implementing standardized practices (e.g., as in medicine, see Riley et al (2019aRiley et al ( ,b, 2020) for data in psychology (i.e., survey data, which likely has measurement error) to ensure adequate sample sizes for machine learning analyses.…”
Section: Discussionmentioning
confidence: 99%