Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval 2020
DOI: 10.1145/3397271.3401298
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Entity Popularity to Improve Spoken Entity Recognition by Virtual Assistants

Abstract: We focus on improving the effectiveness of a Virtual Assistant (VA) in recognizing emerging entities in spoken queries. We introduce a method that uses historical user interactions to forecast which entities will gain in popularity and become trending, and it subsequently integrates the predictions within the Automated Speech Recognition (ASR) component of the VA. Experiments show that our proposed approach results in a 20% relative reduction in errors on emerging entity name utterances without degrading the o… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 42 publications
0
3
0
Order By: Relevance
“…System description. Our ASR system consists of an acoustic model that is a deep convolutional neural network [14], a 4-gram LM with Good-Turing smoothing in the first pass (see [15,16] for details), and the same LM interpolated with a Feed-Forward Neural Network (FFNN) LM [17] in the second pass. To build a scalable TTS-ASR loop, we use our previous generation speech synthesizer, a unit selection system described in [18].…”
Section: Methodsmentioning
confidence: 99%
“…System description. Our ASR system consists of an acoustic model that is a deep convolutional neural network [14], a 4-gram LM with Good-Turing smoothing in the first pass (see [15,16] for details), and the same LM interpolated with a Feed-Forward Neural Network (FFNN) LM [17] in the second pass. To build a scalable TTS-ASR loop, we use our previous generation speech synthesizer, a unit selection system described in [18].…”
Section: Methodsmentioning
confidence: 99%
“…With the integration of language models in voice assistants, users can interact with systems using natural language. They can provide flexibility in user queries for different language use, such as synonyms, and alternative phrasings, and can compensate for inaccurate voice transcription due to the prerecorded priors [68]. This capacity is attributed to LLMs' ability to comprehend intentions and generate natural language in a contextualized manner.…”
Section: Large Language Models In Virtual Assistantsmentioning
confidence: 99%
“…High-quality automatic speech recognition (ASR) is essential for these systems to work well. However, an important subset of requests, those containing named entities, still presents a significant challenge to ASR [2]. This is primarily because it is difficult to build a language model (LM) that accurately models entities that are less popular or only recently popular, as they show up rarely or not at all in training data.…”
Section: Introductionmentioning
confidence: 99%