The reliability and reproducibility of science are under scrutiny. However, a major cause of this lack of repeatability is not being considered: the wide sample-to-sample variability in the P value. We explain why P is fickle to discourage the ill-informed practice of interpreting analyses based predominantly on this statistic.Reproducible research findings are a cornerstone of the scientific method, providing essential validation. There has been recent recognition, however, that the results of published research can be difficult to replicate 1-7 , an awareness epitomized by a series in Nature entitled "Challenges in irreproducible research" and by the Reproducibility Initiative, a project intended to identify and reward reproducible research (http:// validation.scienceexchange.com/#/ reproducibilityinitiative). In a recent meeting at the American Association for the Advancement of Science headquarters involving many of the major journals reporting biomedical science research, a common set of principles and guidelines was agreed upon for promoting transparency and reproducibility 8 . These discussions and initiatives all focused on a number of issues, including aspects of statistical reporting 9 , levels of statistical power (i.e., sufficient statistical capacity to find an effect; a 'statistically significant' finding) 10 and inclusion-exclusion criteria. Yet a fundamental problem inherent in standard statistical methods, one that is pervasively linked to the lack of reproducibility in research, remains to be considered: the wide sample-to-sample variability in the P value. This omission reflects a general lack of awareness about this crucial issue, and we address this matter here.Focusing on the P value during statistical analysis is an entrenched culture [11][12][13] . The P value is often used without the realization that in most cases the statistical power of a study is too low for P to assist the interpretation of the data (Box 1). Among the many and varied reasons for a fearful and hidebound approach to statistical practice, a lack of understanding is prominent 14 . A better understanding of why P is so unhelpful should encourage scientists to reduce their reliance on this misleading concept.Readers may know of the long-standing philosophical debate about the value and validity of null-hypothesis testing [15][16][17] . Although the P value formalizes null-hypothesis testing, this article will not revisit these issues. Rather, we concentrate on how P values themselves are misunderstood.Although statistical power is a central element in reliability 18 , it is often considered only when a test fails to demonstrate a real effect (such as a difference between groups): a 'false negative' result (see Box 2 for a glossary of statistical terms used in this article). Many scientists who are not statisticians do not realize that the power of a test is equally relevant when considering statistically significant results, that is, when the null hypothesis appears to be untenable. This is because the statistical power o...