Automated Speech Scoring System Under The Lens

Bamdev, Pakhi; Grover, Manraj Singh; Kumar, Yaman; Vafaee, Payman; Hama, Mika; Shah, Rajiv Ratn

doi:10.1007/s40593-022-00291-5

Cited by 7 publications

(10 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To compare these results with recent works, (Singla et al, 2021) reports that their hierarchical model achieves an average QWK of 0.82 across four datasets, which is slightly lower our framework on EFSET. Another features-based approach provided by (Bamdev et al, 2023) reports that the system achieves a QWK of 0.81 on SLTI SOPI dataset, which is also lower than our model on EF-SET. These papers suggest that the multimodal multitask framework has a competitive performance in automated speech scoring compared to other recent works.…”

Section: Discussionmentioning

confidence: 63%

See 1 more Smart Citation

Automatic Assessment Of Spoken English Proficiency Based On Multimodal & Multitask Transformers

Nebhi,

Szaszak

2023

Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Proce

View full text Add to dashboard Cite

This paper describes technology developed to automatically grade students on their English spontaneous spoken language proficiency with common european framework of reference for languages (CEFR) level. Our automated assessment system contains two tasks: elicited imitation and spontaneous speech assessment. Spontaneous speech assessment is a challenging task that requires evaluating various aspects of speech quality, content, and coherence. In this paper, we propose a multimodal and multitask transformer model that leverages both audio and text features to perform three tasks: scoring, coherence modeling, and prompt relevancy scoring. Our model uses a fusion of multiple features and multiple modality attention to capture the interactions between audio and text modalities and learn from different sources of information.

show abstract

Section: Discussionmentioning

confidence: 63%

“…Recently, Bamdev et al (2023) presents a machine learning-based approach to assess the English proficiency of non-native speakers from their speech samples. The paper uses the SLTI SOPI dataset, which contains 1200 speech samples with different proficiency levels, rated by human experts on a scale from 1 to 5.…”

Section: Features-based Approachmentioning

confidence: 99%

Automatic Assessment Of Spoken English Proficiency Based On Multimodal & Multitask Transformers

Nebhi,

Szaszak

2023

Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Proce

View full text Add to dashboard Cite

show abstract

“…After encoding the tokens in a sentence, we enumerate through all the possible m spans J = {j1, • • • , ji, • • • , jm} upto a maximum specified length (in terms of number of tokens) for sentence s = {w1, • • • , wT } and then re-assign a label yi ∈ {I, O} for each span ji. For example, for the sentence "NLP is um important", all possible spans (or pairs of start and end indices) are {(1, 1), (2, 2), (3,3), (4,4), (1,2), (2,3), (2,4), (1,3), (1,4)}, and all these spans are labelled O except (3,3) which is labelled I. We denote bi and si as the start and end indices of span ji respectively.…”

Section: Span Representation Layermentioning

confidence: 99%

“…Thus disfluency detection and removal can output clean inputs for downstream NLP tasks, like dialogue systems, question answering, and machine translation. Moreover, disfluency detection also finds applications in automatic speech scoring [1,2].…”

Section: Introductionmentioning

confidence: 99%

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Ghosh¹,

Kumar²,

Kumar³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear sequences in text, thus ignoring the structured information in text which is efficiently captured by dependency trees. In this paper, building on the span classification paradigm of entity recognition, we propose a novel architecture for detecting disfluencies in transcripts from spoken utterances, incorporating both contextual information through transformers and long-distance structured information captured by dependency trees, through graph convolutional networks (GCNs). Experimental results show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior-art by a significant margin. We make all our codes publicly available on GitHub 1 .

show abstract

“…Traditionally, autograding systems are built using manually crafted features used with machine learning based models (Kumar et al, 2019;Bamdev et al, 2022). Lately, these systems have been shifting to deep learning based models (Ke and Ng, 2019).…”

Section: Introductionmentioning

confidence: 99%

Automatic Essay Scoring Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Kumar

Parekh

Singh

et al. 2023

dad

View full text Add to dashboard Cite

Deep-learning based Automatic Essay Scoring (AES) systems are being actively used in various high-stake applications in education and testing. However, little research has been put to understand and interpret the black-box nature of deep-learning-based scoring algorithms. While previous studies indicate that scoring models can be easily fooled, in this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity (i.e., large change in output score with a little change in input essay content) and overstability (i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as “end-to-end” models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. The presence of a few words with high co-occurrence with a certain score class makes the model associate the essay sample with that score. This causes score changes in ∼95% of samples with an addition of only a few words. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and samples causing overstability with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully.

show abstract

Automated Speech Scoring System Under The Lens

Cited by 7 publications

References 39 publications

Automatic Assessment Of Spoken English Proficiency Based On Multimodal & Multitask Transformers

Automatic Assessment Of Spoken English Proficiency Based On Multimodal & Multitask Transformers

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Automatic Essay Scoring Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Contact Info

Product

Resources

About