“…Nevertheless, even when the above assumptions are not fulfilled and, for example, the test items follow the threeparameter logistic model (3PLM), the MH procedure has proved to be effective in a wide variety of situations (Allen & Donoghue, 1996;Donoghue et al, 1993;Roussos & Stout, 1996;Shealy & Stout, 1993;Uttaro & Millsap, 1994). Also well established is its high power for detecting uniform DIF and its sensitivity for detecting some types of nonuniform DIF (Hidalgo & López-Pina, 2004;Mazor, Clauser, & Hambleton, 1994;Narayanan & Swaminathan, 1996;Rogers & Swaminathan, 1993), its capacity for detecting DIF with small sample sizes (Camilli & Smith, 1990;Fidalgo, Ferreres, & Muñiz, 2004;Fidalgo, Hashimoto, Bartram, & Muñiz, 2007;Mazor, Clauser, & Hambleton, 1992;Muñiz, Hambleton, & Xing, 2001;Parshall & Miller, 1995), and the advantages of using two-stage or iterative purification procedures for purifying the matching variable (Clauser, Mazor, & Hambleton, 1993;Fidalgo, Mellenbergh, & Muñiz, 2000;Narayanan & Swaminathan, 1994Rogers & Swaminathan, 1993;Wang & Su, 2004a). It is also fair to point out that substantial latent trait distribution differences of the reference and focal groups yield a highly inflated Type I error (Clauser et al, 1993;Donoghue et al, 1993;Uttaro & Millsap, 1994;Zwick, 1990), especially when the procedure is applied to items that do not fit the 1PLM (Penny & Johnson, 1999).…”