ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746137
|View full text |Cite
|
Sign up to set email alerts
|

SLUE: New Benchmark Tasks For Spoken Language Understanding Evaluation on Natural Speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
45
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(45 citation statements)
references
References 31 publications
0
45
0
Order By: Relevance
“…On the one hand, in higher-level SLU tasks, satisfying performance is still hard to reach. Some researches demonstrate that the pre-trained speech models do not learn significant semantic information [16,14]. On the other hand, speech data is at a lower abundance and more difficult to obtain compared to text data.…”
Section: Pre-trained Speech Modelsmentioning
confidence: 99%
See 3 more Smart Citations
“…On the one hand, in higher-level SLU tasks, satisfying performance is still hard to reach. Some researches demonstrate that the pre-trained speech models do not learn significant semantic information [16,14]. On the other hand, speech data is at a lower abundance and more difficult to obtain compared to text data.…”
Section: Pre-trained Speech Modelsmentioning
confidence: 99%
“…However, the different distribution and different lengths between audios and texts prevent NLP models from participating in SLU tasks directly. Instead, NLP models are applied in SLU in a more indirect and auxiliary way, the spoken language is recognized as texts by ASR, and then NLP models is fine-tuned for downstream SLU tasks [16]. Obviously, this method suffers from errors that occur in the ASR process and loses emotion information by dropping the feature of speech models.…”
Section: Pre-trained Neural Language Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…Due to that, JKPs are not able to capture enough information from multimedia news. Promising directions for extracting knowledge from multimedia sources are multimodal machine learning approaches [72] that combine different types of data such as visual and text representations [73,74] and spoken language understanding tasks that analyse and detect audio speech [75]. Another limitation for knowledge extraction is the dark entities (i.e., those entities that do not exist yet in the knowledge base) [76,77].…”
Section: Informationmentioning
confidence: 99%