Venessa F. Manna scite author profile

ETS Research Report Series

2019

When using the Rasch model, equating with a nonequivalent groups anchor test design is commonly achieved by adjustment of new form item difficulty using an additive equating constant. Using simulated 5‐year data, this report compares 4 approaches to calculating the equating constants and the subsequent impact on equating results. The 4 approaches are mean difference, mean difference with outlier removal using the 0.3 logit rule, mean difference with robust z statistic, and the information‐weighted mean difference. Factors studied included sample size, anchor test length, percentage of anchor items displaying outlier behavior, and the distribution of test item difficulty relative to examine ability. The results indicated that the mean difference and information‐weighted mean difference methods performed similarly across all conditions. In addition, with larger sample sizes, the mean difference with 0.3 logit method performed similarly to these 2 methods. The mean difference with robust z method performed most differently from the other three methods of calculating the equating constant. This method removed a large percentage of the anchor items compared to the mean difference with 0.3 logit method but seemed to produce the most stable trend in performance classification across the 5 years, particularly when sample sizes were large.

Maintaining access to a large-scale test of academic language proficiency during the pandemic: The launch of TOEFL iBT Home Edition

Papageorgiou

Language Assessment Quarterly

2020

Measuring English language workplace proficiency across subgroups: Using CFA models to validate test score interpretation

Yoo

2016

Language Testing

This study assessed the factor structure of the Test of English for International Communication (TOEIC®) Listening and Reading test, and its invariance across subgroups of test-takers. The subgroups were defined by (a) gender, (b) age, (c) employment status, (d) time spent studying English, and (e) having lived in a country where English is the main language. The study results indicated that a correlated two-factor model corresponding to the two language abilities of listening and reading best accounted for the factor structure of the test. In addition, the underlying construct had the same structure across the test-taker subgroups studied. There were, however, significant differences in the means of the latent construct across the subgroups. This study provides empirical support for the current score reporting practice for the TOEIC test, suggests that the test scores have the same meaning across studied test-taker subgroups, and identifies possible test-taker background characteristics that affect English language abilities as measured by the TOEIC test.

Investigating the Relationship Between Test‐Taker Background Characteristics and Test Performance in a Heterogeneous English‐as‐a‐Second‐Language (ESL) Test Population: A Factor Analytic Approach

ETS Research Report Series

Yoo

2015

This study examined the heterogeneity in the English‐as‐a‐second‐language (ESL) test population by modeling the relationship between test‐taker background characteristics and test performance as measured by the TOEFL iBT® using a confirmatory factor analysis (CFA) with covariate approach. The background characteristics studied included: (a) main reason for taking the TOEFL iBT test; (b) time spent studying English; (c) time spent attending a school, college, or university in which content classes were taught in English; and (d) lived in a country where English is the main language. The results indicated that at most levels of the background characteristics studied, there were statistically significant differences in the means of the four underlying latent factors (reading, listening, speaking, and writing) representing English‐language proficiency (ELP). Overall, the effect size differences on the reading, listening, speaking, and writing latent factors among the levels for each of the background variables studied ranged from small to medium. The results of this study provide empirical evidence of the association and possible influence of test‐taker background characteristics on the four underlying latent factors representing ELP and, thus, on test performance.

Time to achieving a designated criterion score level: A survival analysis study of test taker performance on the TOEFL iBT^® test

Monfils

2020

Language Testing

This study used survival analysis to examine the patterns and factors associated with time to achieving designated score criteria on a test of English as a foreign language. This was modeled using an extension of the Cox regression model, with two criterion score levels defined as achieving a TOEFL iBT® total test scale score at or above the Common European Framework of Reference (CEFR) Level B2 and at Level C1, respectively. Factors included in the model were test taker background characteristics including age, gender, native language type, exposure to English, and reason for testing. Additionally, to account for those who tested more than once within the study period, and thus had multiple records, an indicator for order of testing occasion was included in the model. Results indicate that approximately 82% of the test takers in our study sample tested one time in the study period (2014–2016), and the number of repeaters decreased rapidly across occasions. For those who did not achieve the designated criterion scores at first testing, the likelihood of achievement increases with repeated testing, with a somewhat greater effect for the less stringent B2 criterion. Results also indicate that the association of gender with performance differed across levels.