Personalization Strategies for End-to-End Speech Recognition Systems

Gourav, Aditya; Liu, Linda; Gandhe, Ankur; Gu, Yile; Lan, Guitang; Huang, Xiangyang; Kalmane, Shashank; Tiwari, Gautam; Filimonov, Denis; Rastrow, Ariya; Stolcke, Andreas; Bulyko, Ivan

doi:10.1109/icassp39728.2021.9413962

Cited by 25 publications

(7 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speaker recognition (SR) systems are being used in a variety of different smart devices for identifying or authenticating users. Their uses include granting access to individuals [1] who intend to use the products/services provided by the smart devices or customize the provided services by personalizing the experience towards each user [2]. Recently, deep neural networks (DNN) have become the predominant mechanism used in SR systems [3,4,5,6,7,8].…”

Section: Introductionmentioning

confidence: 99%

“…In summary we make the following contributions: (1) We evaluate fairness of the current widely used architectures in SR and crossexamine them with different loss functions used in training. (2) Our study provides a new and comprehensive perspective on fairness in SR systems by incorporating several popular encoder architectures, comparing different methods of training, and evaluating the impact of different factors of bias (both gender and nationality). ( 3) We report the results of our experiments in the form of a comparative analysis that shows the impact of using each combination of architecture/loss function on fairness of SR systems.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Deep Neural Network for Short-Segment Speaker Recognition

Hajavi¹,

Etemad²

2019

Interspeech 2019

View full text Add to dashboard Cite

Today's interactive devices such as smart-phone assistants and smart speakers often deal with short-duration speech segments. As a result, speaker recognition systems integrated into such devices will be much better suited with models capable of performing the recognition task with short-duration utterances. In this paper, a new deep neural network, UtterIdNet, capable of performing speaker recognition with short speech segments is proposed. Our proposed model utilizes a novel architecture that makes it suitable for short-segment speaker recognition through an efficiently increased use of information in short speech segments. UtterIdNet has been trained and tested on the VoxCeleb datasets, the latest benchmarks in speaker recognition. Evaluations for different segment durations show consistent and stable performance for short segments, with significant improvement over the previous models for segments of 2 seconds, 1 second, and especially sub-second durations (250 ms and 500 ms).

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Deep Neural Network for Short-Segment Speaker Recognition

Hajavi¹,

Etemad²

2019

Interspeech 2019

View full text Add to dashboard Cite

show abstract

“…End-to-end ASR models [38], as opposed to traditional Gaussian mixture models, have been increasingly gaining popularity since end-to-end models consist of less componentshence, reducing maintenance costs. However, integration of external LMs into [5,21,29], and personalization of [11,33,34], end-toend systems remains an active research area. With respect to LM, Neural Network LMs (NNLM) [1] have gained popularity within ASR [12,30,43].…”

Section: Beyond Irmentioning

confidence: 99%

Predicting Entity Popularity to Improve Spoken Entity Recognition by Virtual Assistants

Gysel

Tsagkias

Pusateri

et al. 2020

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

We focus on improving the effectiveness of a Virtual Assistant (VA) in recognizing emerging entities in spoken queries. We introduce a method that uses historical user interactions to forecast which entities will gain in popularity and become trending, and it subsequently integrates the predictions within the Automated Speech Recognition (ASR) component of the VA. Experiments show that our proposed approach results in a 20% relative reduction in errors on emerging entity name utterances without degrading the overall recognition quality of the system.

show abstract

“…Rescoring performance has been improved by using contextual information in recent literature. This includes audio [5,11], lattice information from the first pass [6,12], or personalized context [13,14].…”

Section: Introductionmentioning

confidence: 99%

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Kolehmainen¹,

Gu²,

Gourav³

et al. 2023

Interspeech 2023

View full text Add to dashboard Cite

Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a crossattention based encoder-decoder model. We use internal deidentified en-US data from interactions with a virtual voice assistant supplemented with personalized named entities to compare these approaches. On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline. We also show that on this test set, natural language prompts can improve word error rate by 7% without any training and with a marginal loss in generalization. Overall, gazetteers were found to perform the best with a 10% improvement in word error rate (WER), while also improving WER on a general test set by 1%.

show abstract

Personalization Strategies for End-to-End Speech Recognition Systems

Cited by 25 publications

References 20 publications

A Deep Neural Network for Short-Segment Speaker Recognition

A Deep Neural Network for Short-Segment Speaker Recognition

Predicting Entity Popularity to Improve Spoken Entity Recognition by Virtual Assistants

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Contact Info

Product

Resources

About