Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-10885
|View full text |Cite
|
Sign up to set email alerts
|

Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture

Abstract: This paper reimagines some aspects of speech processing using speech encoders, specifically about extracting entities directly from speech, with no intermediate textual representation. In human-computer conversations, extracting entities such as names, postal addresses and email addresses from speech is a challenging task. In this paper, we study the impact of fine-tuning pre-trained speech encoders on extracting spoken entities in human-readable form directly from speech without the need for text transcriptio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…This model achieves a word accuracy of 93.1% on a 28k utterance test set consisting of user utterances that are in response to "How may I help you?" opening prompt from various enterprise virtual agent applications (Singla et al, 2022).…”
Section: Non-autoregressive Speech Based Extractionmentioning
confidence: 99%
See 2 more Smart Citations
“…This model achieves a word accuracy of 93.1% on a 28k utterance test set consisting of user utterances that are in response to "How may I help you?" opening prompt from various enterprise virtual agent applications (Singla et al, 2022).…”
Section: Non-autoregressive Speech Based Extractionmentioning
confidence: 99%
“…In this 2-step approach, we first transcribe the speech provided by humans into text using same pretrained E2E ASR checkpoint used by (Singla et al, 2022). We then extract entities from the transcribed text by learning to translate using (transcription, entity) pairs.…”
Section: Baseline: Cascading Asr and Nlu Systemsmentioning
confidence: 99%
See 1 more Smart Citation