“…All of these studies show that priming can be studied corpus-linguistically, that such studies do not necessarily inflate priming results as may have been feared (because of the noisiness and collinearity that are much more characteristic of corpus data than of experimental data), and that different types of persistence may be distinguished. In addition, corpus data allow the researcher to study more words, prime-target distances, registers, or any other kind of moderator variables than most experimental studies would, as well as to explore the phenomenon in ecologically more valid scenarios: one can easily include lexically-specific frequencies and baseline frequency effects in the analyses and avoid exposing subjects to unnatural stimuli or stimulus distributions potentially leading to within-experiment learning effects (e.g., Schütze, 1996: Section 5.2.3, Gries & Wulff, 2009, Jaeger, 2010, Doğruöz & Gries, 2012, Torres Cacoullos & Travis, 2013, and others), which should therefore also be included in the statistical modeling of priming effects in experimental studies (see e.g., Kootstra & Doedens, 2016, for an example of this). Given the resulting complexity, the movement towards GLMMs, which is now also becoming the standard in experimental studies, is a welcome development: they - avoid conflating individual data points into proportions per lexical item and/or participant which make it difficult, for instance, to explore within-subject accumulative priming effects of the type explored by Gries and Wulff (2009);
- avoid different ANOVAs on different constructional choices (as in Savage et al, 2003) or successive experiments by allowing one to combine datasets and probe interactions between the predictors and a variable coding for datasets; the corpus-linguistic parallel to this would be to not do separate analyses on different speakers or different corpora, but include indicator variables for corpora and speakers as predictors or random effects;
- avoid unnecessary methodological decisions such as the factorization of numerical data;
- provide a state-of-the-art approach towards handling data points that exhibit dependencies including crossed random effects (speakers and/or lexical items) as well as nested random effects (registers/conversations/speakers) and can handle data even if they violate assumptions of repeated-measures ANOVAs (such as sphericity).
…”