Automatic Rating of Spontaneous Speech for Low-Resource Languages

Al-Ghezi, Ragheb; Getman, Yaroslav; Voskoboinik, Ekaterina; Singh, Mittul; Kurimo, Mikko

doi:10.1109/slt54892.2023.10022381

Cited by 3 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To adapt the speech model to Finnish, we first fine-tuned it with 100 hours from the colloquial Lahjoita Puhetta collection of spontaneous native speech [2]. Then, to adapt it to L2 learners' speech, we further fine-tuned it with Finnish DigiTala data (using the three folds selected for training) [18]. Unlike Finnish, Swedish has its own monolingual wav2vec2.0 model [26].…”

Section: The Proposed Benchmark and Baselinementioning

confidence: 99%

“…Because the preliminary experiments [6] indicated that the monolingual model work better for the target language than the multilingual one, we adopted it as our baseline. We then fine-tuned it directly with the SweSchool portion of the DigiTala data (the three folds selected for training) as in [18]. For the L2 ASA systems in both languages, we took the corresponding wav2vec2.0 systems finetuned for DigiTala ASR and trained the new classification heads to perform the ASA tasks as in [18].…”

Section: The Proposed Benchmark and Baselinementioning

confidence: 99%

“…We then fine-tuned it directly with the SweSchool portion of the DigiTala data (the three folds selected for training) as in [18]. For the L2 ASA systems in both languages, we took the corresponding wav2vec2.0 systems finetuned for DigiTala ASR and trained the new classification heads to perform the ASA tasks as in [18]. Class range includes the levels with a sufficient amount of samples for evaluating the models.…”

Section: The Proposed Benchmark and Baselinementioning

confidence: 99%

“…The main contributions of our work include: 1. new carefully transcribed and rated L2 learners' speech data for two lowresource languages, 2. a setup for benchmarking ASR and ASA systems using the new data for training and evaluations, and 3. a state-of-the-art baseline system and its training and testing scripts as well as evaluation results for the benchmark. Parts of our data were used for evaluating our first wav2vec2.0 based systems in [18], and the baseline system and ASR error rates were already presented there. However, in this paper, we compare the ASA performance to inter-reviewer agreement, so all the tables and figures in this paper contain new results for the data and the baseline.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

New data, benchmark and baseline for L2 speaking assessment for low-resoure languages

Kurimo,

Getman,

Voskoboinik

et al. 2023

9th Workshop on Speech and Language Technology in Education (SLaTE)

View full text Add to dashboard Cite

The development of large multilingual speech models provides the possibility to construct high-quality speech technology even for low-resource languages. In this paper, we present the speech data of L2 learners of Finnish and Finland Swedish that we have recently collected for training and evaluation of automatic speech recognition (ASR) and speaking assessment (ASA). It includes over 4000 recordings by over 300 students per language in short read-aloud and free-form tasks. The recordings have been manually transcribed and assessed for pronunciation, fluency, range, accuracy, task achievement, and a holistic proficiency level. We present also an ASR and ASA benchmarking setup we have constructed using this data and include results from our baseline systems built by fine-tuning a self-supervised multilingual model for the target language. In addition to benchmarking, our baseline system can be used by L2 students and teachers for online self-training and evaluation of oral proficiency.

show abstract

Section: The Proposed Benchmark and Baselinementioning

confidence: 99%

Section: The Proposed Benchmark and Baselinementioning

confidence: 99%

Section: The Proposed Benchmark and Baselinementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

New data, benchmark and baseline for L2 speaking assessment for low-resoure languages

Kurimo,

Getman,

Voskoboinik

et al. 2023

9th Workshop on Speech and Language Technology in Education (SLaTE)

View full text Add to dashboard Cite

show abstract

“…Originally, the recordings were rated by humans across the following dimensions: holistic level, pronunciation, fluency, accuracy, range, and task completion (Al-Ghezi et al, 2023). The raters were asked to either assign a score for each dimension or mark a dimension as ungradable (zero).…”

Section: Datamentioning

confidence: 99%

Automated Assessment of Task Completion in Spontaneous Speech for Finnish and Finland Swedish Language Learners

Voskoboinik¹,

Getman²,

Al-Ghezi³

et al. 2023

Linköping Electronic Conference Proceedings

View full text Add to dashboard Cite

This study investigates the feasibility of automated content scoring for spontaneous spoken responses from Finnish and Finland Swedish learners. Our experiments reveal that pretrained Transformer-based models outperform the tf-idf baseline in automatic task completion grading. Furthermore, we demonstrate that pre-fine-tuning these models to differentiate between responses to distinct prompts enhances subsequent task completion finetuning. We observe that task completion classifiers exhibit accelerated learning and produce predictions with stronger correlations to human grading when accounting for task differences. Additionally, we find that employing similarity learning, as opposed to conventional classification fine-tuning, further improves the results. It is especially helpful to learn not just the similarities between the responses in one score bin, but the exact differences between the average human scores responses received. Lastly, we demonstrate that models applied to both manual and ASR transcripts yield comparable correlations to human grading.

show abstract