JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. This content downloaded from 142.58.129.109 on Tue, 16 Jun 2015 20:11:54 UTC All use subject to JSTOR Terms and Conditions P Values as Random Variables-Expected P Values Harold SACKROWITZ and Ester SAMUEL-CAHN p values are extensively reported in practical hypothesis testing situations. Although carefully studied by Dempster and Schatzoff, the stochastic aspect of p values is often neglected.In this expository note we borrow from Dempster and Schatzoff to rekindle interest in-and explore the potential usefulness of-understanding the stochastic behavior of p values. We relate the expected p value (EPV) under the alternative to the more familiar concepts of significance level and power. We then go on to argue that in cases where it is difficult to evaluate the power function, the EPV can be used as a measure of the performance of a test. EPV's are always easily evaluated or simulated. Different test statistics for the same hypotheses can also be compared by means of EPV's. We carry out such a comparison between the two-sample, one-sided Kolmogorov-Smirnov, Mann-Whitney, and t tests, for a variety of underlying distributions. The EPV can also be a valuable tool in sample size determination and in the interpretation of observed p values. We hope to convince practitioners of the usefulness of EPV's.
The most popular multiple testing procedures are stepwise procedures based on
$P$-values for individual test statistics. Included among these are the false
discovery rate (FDR) controlling procedures of Benjamini--Hochberg [J. Roy.
Statist. Soc. Ser. B 57 (1995) 289--300] and their offsprings. Even for models
that entail dependent data, $P$-values based on marginal distributions are
used. Unlike such methods, the new method takes dependency into account at all
stages. Furthermore, the $P$-value procedures often lack an intuitive convexity
property, which is needed for admissibility. Still further, the new methodology
is computationally feasible. If the number of tests is large and the proportion
of true alternatives is less than say 25 percent, simulations demonstrate a
clear preference for the new methodology. Applications are detailed for models
such as testing treatments against control (or any intraclass correlation
model), testing for change points and testing means when correlation is
successive.Comment: Published in at http://dx.doi.org/10.1214/08-AOS616 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Consider the problem of testing whether treatment is "better" than control when the response variable has ordered categories. Our first goal is to offer various definitions of "better" in order to improve understanding of the important aspects of this very basic model. Secondly, for each definition of better we offer complete class theorems. While doing so we identify and review some well-known tests that are in the class and some which are not in the complete class. Tests not in the complete class are inadmissible.Whereas the problem of testing treatment vs control involves a 2 Χ Ο contingency table we extend the definitions and theorems to the case of R. X C contingency tables with ordered categories. 1 Introduction Ordered categorical data occur frequently in fields such as Sociology, Psychology, and Education and are fundamental in medical related research. In fact, in a survey only of articles in volume 306, 1982 of the New England Journal of Medicine, Moses, Emerson and Hosseini (1984) identified 47 instances of ordered categorical variables. A classical protocol to determine the effectiveness of a treatment is to administer a placebo to n\ individuals and to administer the treatment to U2 individuals. Suppose the responses are categorical and the categories are ordered. For example, the categories are
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.