This article reviews all (quasi)experimental studies appearing in the first 19 volumes (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) of Language Teaching Research (LTR). Specifically, it provides an overview of how statistical analyses were conducted in these studies and of how the analyses were reported. The overall conclusion is that there has been a tight adherence to traditional methods and practices, some of which are suboptimal. Accordingly, a number of improvements are recommended. Topics covered include the implications of small average sample sizes, the unsuitability of p values as indicators of replicability, statistical power and implications of low power, the non-robustness of the most commonly used significance tests, the benefits of reporting standardized effect sizes such as Cohen's d, options regarding control of the familywise Type I error rate, analytic options in pretest-posttest designs, 'meta-analytic thinking' and its benefits, and the mistaken use of a significance test to show that treatment groups are equivalent at pretest. An online companion article elaborates on some of these topics plus a few additional ones and offers guidelines, recommendations, and additional background discussion for researchers intending to submit to LTR an article reporting a (quasi)experimental study.
KeywordsEffect sizes, L2 quantitative research, pretest-posttest designs, (quasi)experimental studies, robust methods, small sample sizes, statistical analysis, statistical power, testing for baseline balance
I IntroductionThis article reports a survey of all issues of Language Teaching Research (LTR) from the first issue in 1997 through the latest issue (at the time of writing) of 2015, including special issues. The main aims are (1) to outline of how LTR authors have subjected data