The Pervasive Avoidance of Prospective Statistical Power: Major Consequences and Practical Solutions

Tressoldi, Patrizio; Giofrè, David

doi:10.2139/ssrn.2579268

Cited by 5 publications

(8 citation statements)

References 27 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This evaluation research context presents two main problems. On the one hand, as a conceptual-theoretical framework, the Campbellian tradition presents a series of threats to validity that can affect four different kinds of validity ( Campbell, 1957 ; Campbell and Stanley, 1963 ; Cook and Campbell, 1979 ; Shadish et al, 2002 ): (a) statistical conclusion validity ( García-Pérez, 2012 ) can be affected by a low statistical power ( Tressoldi and Giofré, 2015 ) and a restricted range ( Vaci et al, 2014 ); (b) internal validity can be affected by selection, history, maturation, and regression; (c) construct validity can be affected by construct confounding, treatment-sensitive factorial structure, and inadequate explication of constructs; and (d) external validity can be affected by interaction of the causal relationship with units or outcomes. Although Campbell’s approach provides a conceptual framework for evaluating the main threats to four types of validity ( Shadish et al, 2002 ) and some guidelines (design features) to enhance validity were presented, there is not an empirical, systematic approach to check and control the influence of threats to validity on the treatment effect estimations in program evaluation practice (e.g., Stocké, 2007 ; Krause, 2009 ; Johnson et al, 2015 ).…”

Section: Threats To Validity: Theoretical and Analytical Perspectivesmentioning

confidence: 99%

A Simulation Study of Threats to Validity in Quasi-Experimental Designs: Interrelationship between Design, Measurement, and Analysis

et al. 2016

View full text Add to dashboard Cite

The Campbellian tradition provides a conceptual framework to assess threats to validity. On the other hand, different models of causal analysis have been developed to control estimation biases in different research designs. However, the link between design features, measurement issues, and concrete impact estimation analyses is weak. In order to provide an empirical solution to this problem, we use Structural Equation Modeling (SEM) as a first approximation to operationalize the analytical implications of threats to validity in quasi-experimental designs. Based on the analogies established between the Classical Test Theory (CTT) and causal analysis, we describe an empirical study based on SEM in which range restriction and statistical power have been simulated in two different models: (1) A multistate model in the control condition (pre-test); and (2) A single-trait-multistate model in the control condition (post-test), adding a new mediator latent exogenous (independent) variable that represents a threat to validity. Results show, empirically, how the differences between both the models could be partially or totally attributed to these threats. Therefore, SEM provides a useful tool to analyze the influence of potential threats to validity.

show abstract

Section: Threats To Validity: Theoretical and Analytical Perspectivesmentioning

confidence: 99%

A Simulation Study of Threats to Validity in Quasi-Experimental Designs: Interrelationship between Design, Measurement, and Analysis

et al. 2016

View full text Add to dashboard Cite

show abstract

“…There are other solution to the problem of small sample sizes, as described by Tressoldi and Giofr e (2015). The use of Bayes factors instead of p-values has been increasingly described as one solution.…”

mentioning

confidence: 99%

Sample Size, Statistical Power, and False Conclusions in Infant Looking‐Time Research

Oakes

2017

Infancy

196

133

View full text Add to dashboard Cite

Infant research is hard. It is difficult, expensive, and time consuming to identify, recruit and test infants. As a result, ours is a field of small sample sizes. Many studies using infant looking time as a measure have samples of 8 to 12 infants per cell, and studies with more than 24 infants per cell are uncommon. This paper examines the effect of such sample sizes on statistical power and the conclusions drawn from infant looking time research. An examination of the state of the current literature suggests that most published looking time studies have low power, which leads in the long run to an increase in both false positive and false negative results. Three data sets with large samples (>30 infants) were used to simulate experiments with smaller sample sizes; 1000 random subsamples of 8, 12, 16, 20, and 24 infants from the overall samples were selected, making it possible to examine the systematic effect of sample size on the results. This approach revealed that despite clear results with the original large samples, the results with smaller subsamples were highly variable, yielding both false positive and false negative outcomes. Finally, a number of emerging possible solutions are discussed.

show abstract

“…Over the decades, there have been many surveys of the power of psychological studies and admonitions for greater use of prospective power calculations to ensure adequate power before experiments are conducted or data are collected 8,11,19–25 . Nonetheless, prospective power calculations remain rare 9–11,26 v…”

Section: Statistical Power and The Adequacy Of Researchmentioning

confidence: 99%

“…To choose the sample size, prospective power assumes a value of the mean that might be highly optimistic. Furthermore, choosing sample size prospectively is rare in psychology 9–11 . Post hoc power is the complement of a study's p ‐value and, thereby, can tell us nothing more than the p ‐value.…”

Section: Introductionmentioning

confidence: 99%

Retrospective median power, false positive meta‐analysis and large‐scale replication

Stanley

Doucouliagos

Ioannidis

2021

Research Synthesis Methods

View full text Add to dashboard Cite

Recent, high‐profile, large‐scale, preregistered failures to replicate uncover that many highly‐regarded experiments are “false positives”; that is, statistically significant results of underlying null effects. Large surveys of research reveal that statistical power is often low and inadequate. When the research record includes selective reporting, publication bias and/or questionable research practices, conventional meta‐analyses are also likely to be falsely positive. At the core of research credibility lies the relation of statistical power to the rate of false positives. This study finds that high (>50%–60%) median retrospective power (MRP) is associated with credible meta‐analysis and large‐scale, preregistered, multi‐lab “successful” replications; that is, with replications that corroborate the effect in question. When median retrospective power is low (<50%), positive meta‐analysis findings should be interpreted with great caution or discounted altogether.

show abstract

The Pervasive Avoidance of Prospective Statistical Power: Major Consequences and Practical Solutions

Cited by 5 publications

References 27 publications

A Simulation Study of Threats to Validity in Quasi-Experimental Designs: Interrelationship between Design, Measurement, and Analysis

A Simulation Study of Threats to Validity in Quasi-Experimental Designs: Interrelationship between Design, Measurement, and Analysis

Sample Size, Statistical Power, and False Conclusions in Infant Looking‐Time Research

Retrospective median power, false positive meta‐analysis and large‐scale replication

Contact Info

Product

Resources

About