2023
DOI: 10.1109/taslp.2023.3250842
|View full text |Cite
|
Sign up to set email alerts
|

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

Abstract: Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised modelbased speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compact and data efficient speaker-dependent (SD) parameter representations are used to facilitate both speaker adaptive t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 109 publications
0
3
0
Order By: Relevance
“…These include, but not limited to: 1) auxiliary speaker embedding based approaches [14][15][16][17][18], e.g. iVector [14] and xVector [15]; 2) feature transformation based methods, e.g., feature-space MLLR [19]; and 3) model-based methods [20][21][22][23] that estimate speaker dependent (SD) adapter parameters implemented as, e.g. learning hidden unit contributions (LHUC) [21], during speaker adaptive training (SAT) and test-time unsupervised adaptation [22,23].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…These include, but not limited to: 1) auxiliary speaker embedding based approaches [14][15][16][17][18], e.g. iVector [14] and xVector [15]; 2) feature transformation based methods, e.g., feature-space MLLR [19]; and 3) model-based methods [20][21][22][23] that estimate speaker dependent (SD) adapter parameters implemented as, e.g. learning hidden unit contributions (LHUC) [21], during speaker adaptive training (SAT) and test-time unsupervised adaptation [22,23].…”
Section: Introductionmentioning
confidence: 99%
“…3) This paper presents the first investigation of the complete incorporation of speaker features into all the components of a complete end-to-end audio-visual multichannel speech separation and recognition system. In contrast, prior researches consider speaker adaptation of either the speech separation front-end [12,[24][25][26][27][28][29][30][31][32] alone, or the speech recognition backend [14,[16][17][18][19][21][22][23] only.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation