Bayesian adaptive designs have become popular because of the possibility of increasing the number of patients treated with more beneficial treatments, while still providing sufficient evidence for treatment efficacy comparisons. It can be essential, for regulatory and other purposes, to conduct frequentist analyses both before and after a Bayesian adaptive trial, and these remain challenging. In this paper, we propose a general simulation-based approach to compare frequentist designs with Bayesian adaptive designs based on frequentist criteria such as power and to compute valid frequentist p-values. We illustrate our approach by comparing the power of an equal randomization (ER) design with that of an optimal Bayesian adaptive (OBA) design. The Bayesian design considered here is the dynamic programming solution of the optimization of a specific utility function defined by the number of successes in a patient horizon, including patients whose treatment will be affected by the trial's results after the end of the trial.While the power of an ER design depends on treatment efficacy and the sample size, the power of the OBA design also depends on the patient horizon size. Our results quantify the trade-off between power and the optimal assignment of patients to treatments within the trial. We show that, for large patient horizons, the two criteria are in agreement, while for small horizons, differences can be substantial. This has implications for precision medicine, where patient horizons are decreasing as a result of increasing stratification of patients into subpopulations defined by molecular markers.
KEYWORDSBayesian adaptive designs, dynamic programming, frequentist analyses, optimal strategy Spiegelhalter et al 5 and Cellamare et al 6 have illustrated the use of Bayesian approaches in clinical trials and discussed their advantages. Ethical concerns 7 associated with adaptive designs are also relevant in Bayesian adaptive settings. On the other hand, frequentist criteria remain the mainstay in regulatory decision-making and in the medical literature. This applies to both trial design requirements, such as the power at a given type I error level, and reported 4026