Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2482
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Named Entity Recognition from English Speech

Abstract: Named entity recognition (NER) from text has been a widely studied problem and usually extracts semantic information from text. Until now, NER from speech is mostly studied in a twostep pipeline process that includes first applying an automatic speech recognition (ASR) system on an audio sample and then passing the predicted transcript to a NER tagger. In such cases, the error does not propagate from one step to another as both the tasks are not optimized in an end-to-end (E2E) fashion. Recent studies confirm … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 29 publications
(29 citation statements)
references
References 16 publications
0
29
0
Order By: Relevance
“…Similar to the O2O model trained with the conditional chain mapping in Section 2.2.2, this framework does not assume the conditional independence between output labels and has the flexibility to model the dependency between words/morphemes and linguistic annotations. Related works are using the O2O model, e.g., (Yadav et al, 2020), but they are based on CTC and do not consider such an explicit output dependency. Also, the proposed method using Transformer can preserve a relationship between the word/morpheme and the corresponding linguistic annotations across the sequence based on the aligned representation s i in Eq.…”
Section: O2o Model Trained With Conditional Chain Mappingmentioning
confidence: 99%
See 2 more Smart Citations
“…Similar to the O2O model trained with the conditional chain mapping in Section 2.2.2, this framework does not assume the conditional independence between output labels and has the flexibility to model the dependency between words/morphemes and linguistic annotations. Related works are using the O2O model, e.g., (Yadav et al, 2020), but they are based on CTC and do not consider such an explicit output dependency. Also, the proposed method using Transformer can preserve a relationship between the word/morpheme and the corresponding linguistic annotations across the sequence based on the aligned representation s i in Eq.…”
Section: O2o Model Trained With Conditional Chain Mappingmentioning
confidence: 99%
“…We investigated three models: selfattention-based CTC (Pham et al, 2019), the Transformer (Dong et al, 2018), and a hybrid Transformer trained with an auxiliary CTC objective (Transformer+CTC) (Karita et al, 2019). The CTC model was used in prior studies based on O2O models, e.g., (Audhkhasi et al, 2018;Yadav et al, 2020). During training, the CTC model was regularized with the Transformer decoder in the multitask learning fashion similar to Transformer+CTC.…”
Section: E2e Asrmentioning
confidence: 99%
See 1 more Smart Citation
“…While entity extraction from text is well researched in the literature, NER on speech is less studied. Most initial works on speech had a two staged approach -ASR followed by NER (Cohn et al, 2019), recent works directly extract entities from speech (Ghannay et al, 2018;Yadav et al, 2020). While NER helps in ecommerce search on websites and apps, the specific nature of order identification problem and the limited search space of active orders make NER unnecessary.…”
Section: Related Workmentioning
confidence: 99%
“…Additionally, NER systems' performance are comparatively poor on ASR transcripts owing to high degree of recognition and lexical noise (e.g. missing capitalization etc) (Yadav et al, 2020).…”
Section: Introductionmentioning
confidence: 99%