We extend conformal inference to general settings that allow for time series data. Our proposal is developed as a randomization method and accounts for potential serial dependence by including block structures in the permutation scheme such that the latter forms a group. As a result, the proposed method retains the exact, model-free validity when the data are i.i.d. or more generally exchangeable, similar to usual conformal inference methods. When exchangeability fails, as is the case for common time series data, the proposed approach is approximately valid under weak assumptions on the conformity score.
We introduce new inference procedures for counterfactual and synthetic control methods for policy evaluation. Our methods work in conjunction with many different approaches for predicting counterfactual mean outcomes in the absence of a policy intervention. Examples include synthetic controls, difference-in-differences, factor and matrix completion models, and (fused) time series panel data models. The proposed procedures are valid under weak and easy-to-verify conditions and are provably robust against misspecification. Our approach demonstrates an excellent small-sample performance in simulations and is taken to a data application where we re-evaluate the consequences of decriminalizing indoor prostitution.
We theoretically analyze the problem of testing for
p‐hacking based on distributions of
p‐values across multiple studies. We provide general results for when such distributions have testable restrictions (are non‐increasing) under the null of no
p‐hacking. We find novel additional testable restrictions for
p‐values based on
t‐tests. Specifically, the shape of the power functions results in both complete monotonicity as well as bounds on the distribution of
p‐values. These testable restrictions result in more powerful tests for the null hypothesis of no
p‐hacking. When there is also publication bias, our tests are joint tests for
p‐hacking and publication bias. A reanalysis of two prominent data sets shows the usefulness of our new tests.
Factorial designs are widely used for studying multiple treatments in one experiment. While "long" model t-tests provide valid inferences, "short" model t-tests (ignoring interactions) yield higher power if interactions are zero, but incorrect inferences otherwise. Of 27 factorial experiments published in top-5 journals (2007-2017), 19 use the short model. After including all interactions, over half their results lose significance. Modest local power improvements over the long model are possible, but with lower power for most values of the interaction. If interactions are not of interest, leaving the interaction cells empty yields valid inferences and global power improvements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.