Bayesian approaches to designing replication studies.

Pawel, Samuel; Consonni, Guido; Held, Leonhard

doi:10.1037/met0000604

Cited by 7 publications

(14 citation statements)

References 70 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In combination with the resulting FDPs and FORs of the Bayesian research pipelines, this puts trust in the hierarchical normal-normal model detailed by Pawel et al. 24 Regarding the comparability of frequentist and Bayesian sample size calculations, a further comment is necessary: While frequentist power was calculated for 50% to reject the null hypothesis, the Bayesian sample size calculations are based on the idea to guarantee 90% replication success. Thus, these two sample size planning approaches are not directly comparable, but as noted in the “Methods” section, using larger power values for the frequentist methods increases replication sample sizes n 2 drastically.…”

Section: Discussionmentioning

confidence: 99%

“…Bayesian power analysis simulations for the different BDCs, the MET, and p -values in Welch’s two-sample t -test. These power analyses are a special case of the more general Bayesian sample size calculations in the hierarchical normal-normal model outlined in the Online Appendix and detailed in Pawel et al.. 24 From a hybrid Bayesian-frequentist perspective, they are justified. From a fully Bayesian point of view, they ignore the uncertainty about the true effect size δ, because power is calculated under assumption of a specific value δ = 0.5 , … , 1.0.…”

Section: Methodsmentioning

confidence: 99%

“…In brief, Bayesian sample size planning for the replication study is based on the hierarchical normal-normal model detailed by Pawel et al. 24 Based on the required sample size n 2 false( C false) for the employed criterion C, the replication study is conducted and the same BDC C is used to analyze the replication study. Depending on the results, the treatment efficacy is either confirmed, leading to a success, or questioned, leading to a failure.…”

Section: Two Competing Preclinical Research Pipelinesmentioning

confidence: 99%

“…The p-value-based research pipeline thus makes use of the two-trials rule, which asserts replication success when two significant hypothesis tests are found both in the original and replication study. 23,24 The p-value in Welch's two-sample t-test does not aim at investigating the smallest effect size of interest, however. Therefore, the two one-sided test (TOST) procedure is conducted as an alternative.…”

Section: Statistical Significance Pipelinementioning

confidence: 99%

“…If the latter is the case, the sample size n 2 (C) to attain a prespecified power is calculated for each decision criterion C. Details on the calculation of the sample size for the BDCs are provided in the "Methods" section below, and in the Online Appendix. In brief, Bayesian sample size planning for the replication study is based on the hierarchical normal-normal model detailed by Pawel et al 24 Based on the required sample size n 2 (C) for the employed criterion C, the replication study is conducted and the same BDC C is used to analyze the replication study. Depending on the results, the treatment efficacy is either confirmed, leading to a success, or questioned, leading to a failure.…”

Section: Bdcs Pipelinementioning

confidence: 99%

See 4 more Smart Citations

Reducing the false discovery rate of preclinical animal research with Bayesian statistical decision criteria

Kelter

2023

Stat Methods Med Res

View full text Add to dashboard Cite

The success of preclinical research hinges on exploratory and confirmatory animal studies. Traditional null hypothesis significance testing is a common approach to eliminate the chaff from a collection of drugs, so that only the most promising treatments are funneled through to clinical research phases. Balancing the number of false discoveries and false omissions is an important aspect to consider during this process. In this paper, we compare several preclinical research pipelines, either based on null hypothesis significance testing or based on Bayesian statistical decision criteria. We build on a recently published large-scale meta-analysis of reported effect sizes in preclinical animal research and elicit a non-informative prior distribution under which both approaches are compared. After correcting for publication bias and shrinkage of effect sizes in replication studies, simulations show that (i) a shift towards statistical approaches which explicitly incorporate the minimum clinically important difference reduces the false discovery rate of frequentist approaches and (ii) a shift towards Bayesian statistical decision criteria can improve the reliability of preclinical animal research by reducing the number of false-positive findings. It is shown that these benefits hold while keeping the number of experimental units low which are required for a confirmatory follow-up study. Results show that Bayesian statistical decision criteria can help in improving the reliability of preclinical animal research and should be considered more frequently in practice.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Two Competing Preclinical Research Pipelinesmentioning

confidence: 99%

Section: Statistical Significance Pipelinementioning

confidence: 99%

Section: Bdcs Pipelinementioning

confidence: 99%

See 3 more Smart Citations

Reducing the false discovery rate of preclinical animal research with Bayesian statistical decision criteria

Kelter

2023

Stat Methods Med Res

View full text Add to dashboard Cite

show abstract

Power priors for replication studies

Pawel,

Aust,

Held

et al. 2023

TEST

Self Cite

View full text Add to dashboard Cite

The ongoing replication crisis in science has increased interest in the methodology of replication studies. We propose a novel Bayesian analysis approach using power priors: The likelihood of the original study’s data is raised to the power of $$\alpha $$ α , and then used as the prior distribution in the analysis of the replication data. Posterior distribution and Bayes factor hypothesis tests related to the power parameter $$\alpha $$ α quantify the degree of compatibility between the original and replication study. Inferences for other parameters, such as effect sizes, dynamically borrow information from the original study. The degree of borrowing depends on the conflict between the two studies. The practical value of the approach is illustrated on data from three replication studies, and the connection to hierarchical modeling approaches explored. We generalize the known connection between normal power priors and normal hierarchical models for fixed parameters and show that normal power prior inferences with a beta prior on the power parameter $$\alpha $$ α align with normal hierarchical model inferences using a generalized beta prior on the relative heterogeneity variance $$I^2$$ I 2 . The connection illustrates that power prior modeling is unnatural from the perspective of hierarchical modeling since it corresponds to specifying priors on a relative rather than an absolute heterogeneity scale.

show abstract