We present a novel situational task that integrates collaborative problem solving behavior with testing in a science domain. Participants engage in discourse, which is used to evaluate their collaborative skills. We present initial experiments for automatic classification of such discourse, using a novel classification schema. Considerable accuracy is achieved with just lexical features. A speech-act classifier, trained on out-of-domain data, can also be helpful.
This research report provides an overview of the R&D efforts at Educational Testing Service related to its capability for automated scoring of nonnative spontaneous speech with the SpeechRaterSM automated scoring service since its initial version was deployed in 2006. While most aspects of this R&D work have been published in various venues in recent years, no comprehensive account of the current state of SpeechRater has been provided since the initial publications following its first operational use in 2006. After a brief review of recent related work by other institutions, we summarize the main features and feature classes that have been developed and introduced into SpeechRater in the past 10 years, including features measuring aspects of pronunciation, prosody, vocabulary, grammar, content, and discourse. Furthermore, new types of filtering models for flagging nonscorable spoken responses are described, as is our new hybrid way of building linear regression scoring models with improved feature selection. Finally, empirical results for SpeechRater 5.0 (operationally deployed in 2016) are provided.
In this paper we investigate unsupervised name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics -and therefore share references to named entities -but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We believe that the novelty of our approach lies in the phonetic-based scoring method, which is based on a combination of carefully crafted phonetic features, and empirical results from the pronunciation errors of second-language learners of English. Unlike previous approaches to transliteration, this method can in principle work with any pair of languages in the absence of a training dictionary, provided one has an estimate of the pronunciation of words in text.
Test takers in high-stakes speaking assessments may try to inflate their scores by providing a response to a question that they are more familiar with instead of the question presented in the test; such a response is referred to as an off-topic spoken response. The presence of these responses can make it difficult to accurately evaluate a test taker's speaking proficiency, and thus may reduce the validity of assessment scores. This study aims to address this problem by building an automatic system to detect off-topic spoken responses which can inform the downstream automated scoring pipeline. We propose an innovative method to interpret the comparison between a test response and the question used to elicit it as a similarity grid, and then apply very deep convolutional neural networks to determine different degrees of topic relevance. In this study, Inception networks were applied to this task, and the experimental results demonstrate the effectiveness of the proposed method. Our system achieves an F1-score of 92.8% on the class of off-topic responses, which significantly outperforms a baseline system using a range of word embedding-based similarity metrics (F1score = 85.5%).
This paper describes an end-to-end prototype system for automated scoring of spoken responses in a novel assessment for teachers of English as a Foreign Language who are not native speakers of English. The 21 speaking items contained in the assessment elicit both restricted and moderately restricted responses, and their aim is to assess the essential speaking skills that English teachers need in order to be effective communicators in their classrooms. Our system consists of a state-of-the-art automatic speech recognizer; multiple feature generation modules addressing diverse aspects of speaking proficiency, such as fluency, pronunciation, prosody, grammatical accuracy, and content accuracy; a filter that identifies and flags problematic responses; and linear regression models that predict response scores based on subsets of the features. The automated speech scoring system was trained and evaluated on a data set involving about 1,400 test takers, and achieved a speaker-level correlation (when scores for all 21 responses of a speaker are aggregated) with human expert scores of 0.73.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.