Speaking performances, stakeholder perceptions, and test scores: Extrapolating from the Duolingo English test to the university

Isbell, Daniel R.; Crowther, Dustin; Nishizawa, Hitoshi

doi:10.1177/02655322231165984

Cited by 4 publications

(6 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Though the strongest correlations among items were generally those within the same scale (e.g., the correlations among the three comprehensibility items), it is also worth noting the high correlations among items across the two scales; we return to the relationship between comprehensibility and acceptability later in the manuscript. Additionally, in Isbell et al (2023), many-facet Rasch models yielded strong evidence of unidimensional measurement and good fit of judgment items for both comprehensibility and acceptability.…”

Section: Methodsmentioning

confidence: 91%

“…A key characteristic of the current study was that, given the source of speech elicited (high-stakes English proficiency test [i.e., DET]), it was possible to include speakers representing a wide range of linguistic backgrounds and overall proficiency. That is, our population included both those who would and those who would not receive admission into English-medium university study (see Isbell et al, 2023), ranging in proficiency from roughly CEFR levels B1 to C1. This is in contrast to many existing studies, where the speaker population was more homogenous.…”

Section: Discussionmentioning

confidence: 99%

“…Reliability of each judgment scale and for listener averages across all scales for comprehensibility and acceptability were estimated using 2-way random effects intraclass correlations (ICC, McGraw & Wong, 1996) using the irrNA package (v.0.2.2, Brueckl & Heuer, 2021) in R to accommodate missingness in a sparse rating design. As shown in Table 4, absolute agreement among listeners’ single scores was low (.27 – .34), indicating variability in listener scale use (i.e., some listeners were more lenient in judgments while others were stricter, see Isbell et al, 2023). However, the degree of consistency for average ratings from k randomly selected raters was considerably higher (.79 – .84).…”

Section: Methodsmentioning

confidence: 99%

“…For example, Kang et al (2010) made use of a five-item scale (easy/hard to understand, incomprehensible/highly comprehensible, needed little effort/lots of effort to understand, unclear/clear, and simple/difficult to grasp the meaning), with reliability across items high (Cronbach α = .94). In Isbell et al (2023), we investigated the relationship between stakeholders’ speech perceptions and DET speaking performance using the same speech samples analyzed here. As Schmidgall and Powers’ (2021) reported a similar analysis focused on TOEIC speaking performance, we made use of their three-item scale as a measure of comprehensibility.…”

Section: Methodsmentioning

confidence: 99%

“…Based on Isbell et al (2023), only minor differences existed between academic listening groups (undergraduate students, graduates, faculty, and administrative staff) in responses to judgment questions. While faculty tended to be most lenient in their ratings, and administrative staff most harsh, the magnitudes of differences between group average scores were small (less than half a point in most cases).…”

Section: Methodsmentioning

confidence: 99%

See 4 more Smart Citations

Second language speech comprehensibility and acceptability in academic settings: Listener perceptions and speech stream influences

Crowther

Isbell

Nishizawa

2023

Applied Psycholinguistics

Self Cite

View full text Add to dashboard Cite

Ideally, comprehensible second language (L2) speech would be seen as acceptable speech. However, the association between these dimensions is underexplored. To investigate the relationship between comprehensibility and “academic acceptability,” defined here as how well a speaker could meet the demands of a given role in an academic setting, 204 university stakeholders judged L2 speech samples elicited from a standardized English test used for university admissions. Four tasks from 100 speakers were coded for 13 speech stream characteristics. Judgments for comprehensibility and acceptability correlated strongly (r = .93). Linear mixed-effects models, used to examine judgments across all tasks and separately for each task, indicated that while random intercepts (i.e., speaker ability, listener severity) explained a substantial amount of total variation (32–44%) in listener judgments compared to speech characteristic fixed effects (8–21%), fixed effects did account for variation in speaker random effects (reducing variation compared to intercept-only models by 50–90%). Despite some minimal differences across task types, the influence of speech characteristics across both judgments was mostly similar. While providing evidence that comprehensible speech can indeed be perceived as acceptable, this study also provides evidence that speakers demonstrate both consistent and less consistent performance, in reference to speech stream production, across performances.

show abstract

Section: Methodsmentioning

confidence: 91%

Section: Discussionmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 3 more Smart Citations

Second language speech comprehensibility and acceptability in academic settings: Listener perceptions and speech stream influences

Crowther

Isbell

Nishizawa

2023

Applied Psycholinguistics

Self Cite

View full text Add to dashboard Cite

show abstract

EFL Learner Perceptions and Engagement of a Customized AI-led Class

Lee

2024

RELC Journal

View full text Add to dashboard Cite

Technological advancement has enabled language educators to employ AI virtual humans as online instructors by customizing their characteristics, such as English varieties, to meet learners’ needs and preferences. As AI instructors become a viable option in classrooms, how they affect language learners’ learning warrants investigation. Building upon social presence theory regarding interpersonal relationships in an online environment, this study aimed to examine the role of social presence and AI instructors’ credibility in fostering learner engagement. Additionally, it examined the effects of variables within instructors on instructor credibility and learner engagement. In the study, a 2 (human or AI) x 2 (native or non-native English-speaking teacher) between-subjects design was utilized in an online experiment with 120 English learners. Regression and mediation analyses revealed, in AI-led classes, social presence positively influenced learner engagement, with instructor credibility fully mediating this relationship. According to a two-way MANOVA analysis used to examine the effects of humanness and nativeness on credibility and engagement, no evidence was found to support a difference between AI instructors and their human counterparts when observing learners’ perceptions and engagement, regardless of whether the instructors were NESTs or NNESTs. The results show that AI instructors can be a viable alternative in language classes.

show abstract

Evaluating the impact of nonverbal behavior on language ability ratings

Burton

2024

Language Testing

View full text Add to dashboard Cite

Nonverbal behavior can impact language proficiency scores in speaking tests, but there is little empirical information of the size or consistency of its effects or whether language proficiency may be a moderating variable. In this study, 100 novice raters watched and scored 30 recordings of test takers taking an international, high stakes proficiency test. The speech samples were each 2 minutes long and ranged in proficiency levels. The raters scored each sample on fluency, vocabulary, grammar, and comprehensibility using 7-point semantic differential scales. Nonverbal behavior was extracted using an automated machine learning software called iMotions, and data was analyzed with ordinal mixed effects regression. Results showed that attentional variance predicted fluency, vocabulary, and grammar scores, but only when accounting for proficiency. Higher standard deviations of attention corresponded with lower scores for the lower-proficiency group, but not the mid/higher-proficiency group. Comprehensibility scores were only predicted by mean valence when proficiency was an interaction term. Higher mean valence, or positive emotional behavior, corresponded with higher scores in the lower-proficiency group, but not the mid/higher-proficiency group. Effect sizes for these predictors were quite small, with small amounts of variance explained. These results have implications for construct representation and test fairness.

show abstract

Speaking performances, stakeholder perceptions, and test scores: Extrapolating from the Duolingo English test to the university

Cited by 4 publications

References 54 publications

Second language speech comprehensibility and acceptability in academic settings: Listener perceptions and speech stream influences

Second language speech comprehensibility and acceptability in academic settings: Listener perceptions and speech stream influences

EFL Learner Perceptions and Engagement of a Customized AI-led Class

Evaluating the impact of nonverbal behavior on language ability ratings

Contact Info

Product

Resources

About