When comparing two independent groups, psychology researchers commonly use Student's t-tests. Assumptions of normality and homogeneity of variance underlie this test. More often than not, when these conditions are not met, Student's t-test can be severely biased and lead to invalid statistical inferences. Moreover, we argue that the assumption of equal variances will seldom hold in psychological research, and choosing between Student's t-test and Welch's t-test based on the outcomes of a test of the equality of variances often fails to provide an appropriate answer. We show that the Welch's t-test provides a better control of Type 1 error rates when the assumption of homogeneity of variance is not met, and it loses little robustness compared to Student's t-test when the assumptions are met. We argue that Welch's t-test should be used as a default strategy.
Keywords:Welch's t-test; Student's t-test; homogeneity of variance; Levene's test; Homoscedasticity; statistical power; type 1 error; type 2 error Independent sample t-tests are commonly used in the psychological literature to statistically test differences between means. There are different types of t-tests, such as Student's t-test, Welch's t-test, Yuen's t-test, and a bootstrapped t-test. These variations differ in the underlying assumptions about whether data is normally distributed and whether variances in both groups are equal (see, e.g., Rasch, Kubinger, & Moder, 2011;Yuen, 1974). Student's t-test is the default method to compare two groups in psychology. The alternatives that are available are considerably less often reported. This is surprising, since Welch's t-test is often the preferred choice and is available in practically all statistical software packages.In this article, we will review the differences between Welch's t-test, Student's t-test, and Yuen's t-test, and we suggest that Welch's t-test is a better default for the social sciences than Student's and Yuen's t-tests. We do not include the bootstrapped t-test because it is known to fail in specific situations, such as when there are unequal sample sizes and standard deviations differ moderately (Hayes & Cai, 2007).When performing a t-test, several software packages (i.e., R and Minitab) present Welch's t-test by default. Users can request Student's t-test, but only after explicitly stating that the assumption of equal variances is met. Student's t-test is a parametric test, which means it relies on assumptions about the data that are analyzed. Parametric tests are believed to be more powerful than non-parametric tests (i.e., tests that do not require assumptions about the population parameters; Sheskin, 2003). However, Student's t-test is generally only more powerful when the data are normally distributed (the assumption of normality) and the variances are equal in both groups (homoscedasticity; the assumption of homogeneity of variance; Carroll & Schneider, 1985;Erceg-Hurn & Mirosevich, 2008).When sample sizes are equal between groups, Student's t-test is robust to violations of the assump...