2022 IEEE Spoken Language Technology Workshop (SLT) 2023
DOI: 10.1109/slt54892.2023.10022381
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Rating of Spontaneous Speech for Low-Resource Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…To adapt the speech model to Finnish, we first fine-tuned it with 100 hours from the colloquial Lahjoita Puhetta collection of spontaneous native speech [2]. Then, to adapt it to L2 learners' speech, we further fine-tuned it with Finnish DigiTala data (using the three folds selected for training) [18]. Unlike Finnish, Swedish has its own monolingual wav2vec2.0 model [26].…”
Section: The Proposed Benchmark and Baselinementioning
confidence: 99%
See 3 more Smart Citations
“…To adapt the speech model to Finnish, we first fine-tuned it with 100 hours from the colloquial Lahjoita Puhetta collection of spontaneous native speech [2]. Then, to adapt it to L2 learners' speech, we further fine-tuned it with Finnish DigiTala data (using the three folds selected for training) [18]. Unlike Finnish, Swedish has its own monolingual wav2vec2.0 model [26].…”
Section: The Proposed Benchmark and Baselinementioning
confidence: 99%
“…Because the preliminary experiments [6] indicated that the monolingual model work better for the target language than the multilingual one, we adopted it as our baseline. We then fine-tuned it directly with the SweSchool portion of the DigiTala data (the three folds selected for training) as in [18]. For the L2 ASA systems in both languages, we took the corresponding wav2vec2.0 systems finetuned for DigiTala ASR and trained the new classification heads to perform the ASA tasks as in [18].…”
Section: The Proposed Benchmark and Baselinementioning
confidence: 99%
See 2 more Smart Citations
“…Originally, the recordings were rated by humans across the following dimensions: holistic level, pronunciation, fluency, accuracy, range, and task completion (Al-Ghezi et al, 2023). The raters were asked to either assign a score for each dimension or mark a dimension as ungradable (zero).…”
Section: Datamentioning
confidence: 99%