In experimental science, it is desirable to hold all factors constant except those intentionally manipulated. In psychology, however, this ideal is often not possible. Elements such as participants and items vary, in addition to the intended factors. For example, a researcher interested in the psychology of reading might manipulate the part of speech and observe reading times. In this case, there is unintended variability from the selection of both participants and items. In his classic article, "The Languageas-Fixed-Effect Fallacy: A Critique of Language Statistics in Psychological Research," H. H. Clark (1973) discussed how unintended variability from the simultaneous selection of participants and items leads to underestimation of confidence intervals and inflation of Type I error rates in conventional analysis. Type I error rate inflation, or an increased tendency to find a significant effect when none exists, is highly undesirable.To demonstrate the problem, consider the question of whether nouns and verbs are read at the same rate. To answer this question, a researcher could randomly select suitable verbs and nouns and ask a number of participants to read them. Each participant produces a set of reading time scores for both nouns and verbs. A common approach is to tabulate for each participant one mean reading time for nouns and another for verbs. To test the hypothesis of the equality of reading rates, these pairs of mean reading times may be submitted to paired t tests. This analytic approach is often used in memory research. For example, Riefer and Rouder (1992) used this analysis to determine whether bizarre sentences are better remembered than common ones. Clark (1973), however, argued that using t tests to analyze means tabulated across different items leads to Type I error rate inflation.In the following demonstration, we show by simulation that this inflation is not only real, but also surprisingly large. We generate data for a standard ANOVA-style model (discussed below) with no part-of-speech effects. We analyze these data by first computing participant means for each part of speech and then submitting these means to a paired t test. This process is performed repeatedly, and the proportion of significant results is reported. If the test has no Type I error inflation, the proportion should be the nominal Type I error rate, which is set to the conventional value of .05.Consider the following ANOVA-style model for nouns: It is reasonable to expect that each participant has a unique effect on reading time; some participants are fast at reading, but others are slow. This effect for the ith participant
573Copyright 2005 Psychonomic Society, Inc. Although many nonlinear models of cognition have been proposed in the past 50 years, there has been little consideration of corresponding statistical techniques for their analysis. In analyses with nonlinear models, unmodeled variability from the selection of items or participants may lead to asymptotically biased estimation. This asymptotic bias, in turn, ren...