ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413962
|View full text |Cite
|
Sign up to set email alerts
|

Personalization Strategies for End-to-End Speech Recognition Systems

Abstract: The recognition of personalized content, such as contact names, remains a challenging problem for end-to-end speech recognition systems. In this work, we demonstrate how first-and second-pass rescoring strategies can be leveraged together to improve the recognition of such words. Following previous work, we use a shallow fusion approach to bias towards recognition of personalized content in the first-pass decoding. We show that such an approach can improve personalized content recognition by up to 16% with min… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 25 publications
(7 citation statements)
references
References 20 publications
0
7
0
Order By: Relevance
“…Speaker recognition (SR) systems are being used in a variety of different smart devices for identifying or authenticating users. Their uses include granting access to individuals [1] who intend to use the products/services provided by the smart devices or customize the provided services by personalizing the experience towards each user [2]. Recently, deep neural networks (DNN) have become the predominant mechanism used in SR systems [3,4,5,6,7,8].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Speaker recognition (SR) systems are being used in a variety of different smart devices for identifying or authenticating users. Their uses include granting access to individuals [1] who intend to use the products/services provided by the smart devices or customize the provided services by personalizing the experience towards each user [2]. Recently, deep neural networks (DNN) have become the predominant mechanism used in SR systems [3,4,5,6,7,8].…”
Section: Introductionmentioning
confidence: 99%
“…In summary we make the following contributions: (1) We evaluate fairness of the current widely used architectures in SR and crossexamine them with different loss functions used in training. (2) Our study provides a new and comprehensive perspective on fairness in SR systems by incorporating several popular encoder architectures, comparing different methods of training, and evaluating the impact of different factors of bias (both gender and nationality). ( 3) We report the results of our experiments in the form of a comparative analysis that shows the impact of using each combination of architecture/loss function on fairness of SR systems.…”
Section: Introductionmentioning
confidence: 99%
“…End-to-end ASR models [38], as opposed to traditional Gaussian mixture models, have been increasingly gaining popularity since end-to-end models consist of less componentshence, reducing maintenance costs. However, integration of external LMs into [5,21,29], and personalization of [11,33,34], end-toend systems remains an active research area. With respect to LM, Neural Network LMs (NNLM) [1] have gained popularity within ASR [12,30,43].…”
Section: Beyond Irmentioning
confidence: 99%
“…Rescoring performance has been improved by using contextual information in recent literature. This includes audio [5,11], lattice information from the first pass [6,12], or personalized context [13,14].…”
Section: Introductionmentioning
confidence: 99%