2021
DOI: 10.48550/arxiv.2109.06952
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 0 publications
1
3
0
Order By: Relevance
“…We find that fine-tuning the entire model (i.e., adapting all parameters of the Basemodel including encoder and decoders) for each speaker exerts substantial improvements across all our speakers with an average WER of 14.2%. This is consistent with our previous work that Parrotron model fine-tuning achieves high quality model personalization for atypical speech [4,23,18,12]. For this experiment, we employ the same model architecture and the same fine-tuning procedure described in [18].…”
Section: Basemodel Vs Model Fine-tuning Resultssupporting
confidence: 76%
See 3 more Smart Citations
“…We find that fine-tuning the entire model (i.e., adapting all parameters of the Basemodel including encoder and decoders) for each speaker exerts substantial improvements across all our speakers with an average WER of 14.2%. This is consistent with our previous work that Parrotron model fine-tuning achieves high quality model personalization for atypical speech [4,23,18,12]. For this experiment, we employ the same model architecture and the same fine-tuning procedure described in [18].…”
Section: Basemodel Vs Model Fine-tuning Resultssupporting
confidence: 76%
“…We choose residual adapter layers [10] as our choice of Submodels for several reasons: (1) Adapter layers can easily be added to the encoder which is generally responsible for modeling the acoustic-phonetic of speech from the acoustic; signal, (2) Due to their residual connection one can disable the Submodel by simply setting the residual factor to zero, reverting the model to the Basemodel; (3) The size of this Submodel can be easily controlled by a bottleneck dimension; (4) Controlling the bottleneck dimension is internal to the Submodel, allowing the use of a pre-compiled and optimized execution graph for fast inference while being able to replace the tensors shape dynamically; (5) We have seen previously that adapters successfully model atypical and accented speech for ASR personalization and specialization. [12]…”
Section: Submodelsmentioning
confidence: 99%
See 2 more Smart Citations