Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-676
|View full text |Cite
|
Sign up to set email alerts
|

Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 17 publications
2
5
0
Order By: Relevance
“…This is consistent with our previous work that Parrotron model fine-tuning achieves high quality model personalization for atypical speech [4,23,18,12]. For this experiment, we employ the same model architecture and the same fine-tuning procedure described in [18]. This work confirms that a fine-tuning strategy performs well on speakers across etiologies and severities.…”
Section: Basemodel Vs Model Fine-tuning Resultssupporting
confidence: 87%
See 3 more Smart Citations
“…This is consistent with our previous work that Parrotron model fine-tuning achieves high quality model personalization for atypical speech [4,23,18,12]. For this experiment, we employ the same model architecture and the same fine-tuning procedure described in [18]. This work confirms that a fine-tuning strategy performs well on speakers across etiologies and severities.…”
Section: Basemodel Vs Model Fine-tuning Resultssupporting
confidence: 87%
“…We find that fine-tuning the entire model (i.e., adapting all parameters of the Basemodel including encoder and decoders) for each speaker exerts substantial improvements across all our speakers with an average WER of 14.2%. This is consistent with our previous work that Parrotron model fine-tuning achieves high quality model personalization for atypical speech [4,23,18,12]. For this experiment, we employ the same model architecture and the same fine-tuning procedure described in [18].…”
Section: Basemodel Vs Model Fine-tuning Resultssupporting
confidence: 75%
See 2 more Smart Citations
“…Rule-based VC tends to apply manually designed, speakerdependent rules to correct phoneme errors or modify temporal and frequency features to improve intelligibility [12,13]. Statistical VC automatically maps the features of dysarthric speech to those of normal speech, where typical approaches contain Gaussian mixture model [14], non-negative matrix factorization [15,16], partial least squares [17], and deep learning methods including sequenceto-sequence (seq2seq) models [18][19][20] and gated convolutional networks [21]. Though significant progress has been made, previous work generally ignores speaker identity preservation, which loses the ability for patients to demonstrate their personality via acoustic characteristics.…”
Section: Introductionmentioning
confidence: 99%