2021
DOI: 10.1109/taslp.2021.3091805
|View full text |Cite
|
Sign up to set email alerts
|

Recent Progress in the CUHK Dysarthric Speech Recognition System

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
38
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 52 publications
(38 citation statements)
references
References 53 publications
0
38
0
Order By: Relevance
“…In contrast, related previous works either used: a) synthesized normal speech acoustic-articulatory features trained A2A inversion models before being applied to dysarthric speech [23], while the large mismatch between normal and impaired speech encountered during inversion model training and articulatory feature generation stages was not taken into account; or b) only considered the cross-domain or cross-corpus A2A inversion [25] while the quality of generated articulatory features was not assessed using the back-end disordered speech recognition systems. In addition, the lowest published WER of 24.82% on the benchmark UASpeech task in comparison against recent researches [8][9][10][11][12][13][37][38][39] was obtained using the proposed cross-domain acoustic-to-articulatory inversion approach.…”
Section: Introductionmentioning
confidence: 78%
See 4 more Smart Citations
“…In contrast, related previous works either used: a) synthesized normal speech acoustic-articulatory features trained A2A inversion models before being applied to dysarthric speech [23], while the large mismatch between normal and impaired speech encountered during inversion model training and articulatory feature generation stages was not taken into account; or b) only considered the cross-domain or cross-corpus A2A inversion [25] while the quality of generated articulatory features was not assessed using the back-end disordered speech recognition systems. In addition, the lowest published WER of 24.82% on the benchmark UASpeech task in comparison against recent researches [8][9][10][11][12][13][37][38][39] was obtained using the proposed cross-domain acoustic-to-articulatory inversion approach.…”
Section: Introductionmentioning
confidence: 78%
“…Due to the acoustic domain mismatch, a direct cross-domain application of the A2A inversion model (described in Section 3.1) trained on the TORGO acoustic-articulatory parallel data to the UASpeech acoustic data is problematic, as was shown in the previous research on cross-domain audio-visual inversion [12]. To this end, the large acoustic domain mismatch between the two data sets can be minimized using multi-level adaptive networks (MLAN) [12,39,43] before A2A inversion can be performed. An example MLAN model is shown in the left part of Figure 2 (circled in red dotted line).…”
Section: Cross-domain A2a Inversionmentioning
confidence: 99%
See 3 more Smart Citations