Gary J. Ockey scite author profile

Over the past decade, listening comprehension tests have been converting to computer-based tests that include visual input. However, little research is available to suggest how test takers engage with different types of visuals on such tests. The present study compared a series of still images to video in academic computer-based tests to determine how test takers engage with these two test modes. The study, which employed observations, retrospective reports and interviews, used data from university-level non-native speakers of English. The findings suggest that test takers engage differently with these two modes of delivery. Specifically, while test takers engaged minimally and similarly with the still images, there was wide variation in the ways and degree to which they engaged with the video stimulus. Implications of the study are that computer-based tests of listening comprehension could include still images with only minimally altering the construct that is measured by audio-only listening tests, but the utilization of video in such computer-based tests may require a rethinking of the listening construct.

show abstract

A many-facet Rasch analysis of the second language group oral discussion task

Bonk

2003

View full text Add to dashboard Cite

FACETS many-facet Rasch analysis software (Linacre, 1998a) was utilized to look at two consecutive administrations of a large-scale (more than 1000 examinees) second language oral assessment in the form of a peer group discussion task with Japanese English-major university students. Facets modeled in the analysis were examinee, prompt, rater, and ve rating category 'items.' Unidimensionality was shown to be strong in both datasets, and approaches to interpreting t values for the facets modeled in the analysis were discussed. Examinee ability was the most substantial facet, followed by rater severity, and item. The prompt facet was negligible in magnitude. Rater differences in terms of severity were generally large, but this characteristic was not stable over time for individuals; returning raters tended to move toward greater severity and consistency, while new raters showed much more inconsistency. Analysis of the scales showed general validity in gradations of scale steps, though raters had some dif culty discerning between categories at the ends of the scales for pronunciation and communicative skills.

show abstract

From One to Multiple Accents on a Test of L2 Listening Comprehension

Ockey

French

2014

Applied Linguistics

View full text Add to dashboard Cite

Concerns about the need for assessing multidialectal listening skills for global contexts are becoming increasingly prevalent. However, the inclusion of multiple accents on listening assessments may threaten test fairness because it is not practical to include every accent that may be encountered in the language use domain on these tests. Given this dilemma, this study aimed to determine the extent to which accent strength and familiarity affect comprehension and to provide a defensible direction for assessing multidialectal listening comprehension. A strength of accent scale was developed, and one US, four Australian, and four British English speakers of English were selected based on a judgment of their strength of accent. Next, TOEFL test takers (N = 21,726) were randomly assigned to listen to a common lecture given by one of the nine selected speakers, and respond to six comprehension items and a survey designed to assess their familiarity with various accents. The results suggest that strength of accent and familiarity do affect listening comprehension, and these factors affect comprehension even with quite light accents. AbstractConcerns about the need for assessing multidialectal listening skills for global contexts are

show abstract

Developments and Challenges in the Use of Computer‐Based Testing for Assessing Second Language Ability

Ockey

2009

The Modern Language Journal

View full text Add to dashboard Cite

Computer‐based testing (CBT) to assess second language ability has undergone remarkable development since Garret (1991) described its purpose as “the computerized administration of conventional tests” in The Modern Language Journal. For instance, CBT has made possible the delivery of more authentic tests than traditional paper‐and‐pencil tests. CBT has also made it possible to more reliably, practically, and almost instantaneously score essays, oral speech samples, and other types of test responses. Unfortunately, however, due to a number of unresolved problems, CBT has failed to realize its anticipated potential. CBT has limited usability because systems that ensure test and score security have yet to be developed. Computer‐adaptive testing, one of the most promising areas of CBT has not met expectations because of failure to solve problems with the statistical techniques on which it is based and the lack of resources necessary to implement it in most assessment contexts. In spite of these and other limitations, given the growing capability of CBT to deliver more authentic tests than paper‐and‐pencil, its use for assessing second language ability will undoubtedly continue to expand.

show abstract

The effects of group members' personalities on a test taker's L2 group oral discussion test scores

Ockey

2009

Language Testing

View full text Add to dashboard Cite

The second language group oral is a test of second language speaking proficiency, in which a group of three or more English language learners discuss an assigned topic without interaction with interlocutors. Concerns expressed about the extent to which test takers' personal characteristics affect the scores of others in the group have limited its attractiveness. This study investigates the degree to which assertive and non-assertive test takers' scores are affected by the levels of assertiveness of their group members. The sample of test takers was Japanese first year university students who were studying English in Japan. The students took the revised NEO-PI-R (Costa & McCrae, 1992; Shimanoka et al., 2002), a group oral test, and PhonePass SET-10 (Ordinate, 2004). Two separate MANCOVA analyses were conducted, one designed to determine the extent to which assertive test takers' scores are affected by the levels of assertiveness of group members (N = 112), and one designed to determine the extent to which non-assertive test takers' scores are affected by the levels of assertiveness of group members (N = 113). The analyses indicated that assertive test takers were assigned higher scores than expected when grouped with only non-assertive test takers and lower scores than expected when grouped with only assertive test takers, while the study failed to find an effect for grouping based on assertiveness for non-assertive test takers' scores. The findings of the study suggest that when the group oral is used, rater-training sessions should include guidance on how to evaluate a test taker in the context of the group in which the test taker is assessed and assign scores that are not based on a comparison of proficiencies of group members.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gary J. Ockey

Construct implications of including still image or video in computer-based listening tests

A many-facet Rasch analysis of the second language group oral discussion task

From One to Multiple Accents on a Test of L2 Listening Comprehension

Developments and Challenges in the Use of Computer‐Based Testing for Assessing Second Language Ability

The effects of group members' personalities on a test taker's L2 group oral discussion test scores

Contact Info

Product

Resources

About