Background
Estimating the required sample size is crucial when developing and validating clinical prediction models. However, there is no consensus about how to determine the sample size in such a setting. Here, the goal was to compare available methods to define a practical solution to sample size estimation for clinical predictive models, as applied to Horizon 2020 PRIMAGE as a case study.
Methods
Three different methods (Riley’s; “rule of thumb” with 10 and 5 events per predictor) were employed to calculate the sample size required to develop predictive models to analyse the variation in sample size as a function of different parameters. Subsequently, the sample size for model validation was also estimated.
Results
To develop reliable predictive models, 1397 neuroblastoma patients are required, 1060 high-risk neuroblastoma patients and 1345 diffuse intrinsic pontine glioma (DIPG) patients. This sample size can be lowered by reducing the number of variables included in the model, by including direct measures of the outcome to be predicted and/or by increasing the follow-up period. For model validation, the estimated sample size resulted to be 326 patients for neuroblastoma, 246 for high-risk neuroblastoma, and 592 for DIPG.
Conclusions
Given the variability of the different sample sizes obtained, we recommend using methods based on epidemiological data and the nature of the results, as the results are tailored to the specific clinical problem. In addition, sample size can be reduced by lowering the number of parameter predictors, by including direct measures of the outcome of interest.