SLUE: New Benchmark Tasks For Spoken Language Understanding Evaluation on Natural Speech

Shon, Suwon; Pasad, Ankita; Wu, Felix; Brusco, Pablo; Artzi, Yoav; Livescu, Karen; Han, Ki Jin

doi:10.1109/icassp43922.2022.9746137

Cited by 30 publications

(45 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the one hand, in higher-level SLU tasks, satisfying performance is still hard to reach. Some researches demonstrate that the pre-trained speech models do not learn significant semantic information [16,14]. On the other hand, speech data is at a lower abundance and more difficult to obtain compared to text data.…”

Section: Pre-trained Speech Modelsmentioning

confidence: 99%

“…However, the different distribution and different lengths between audios and texts prevent NLP models from participating in SLU tasks directly. Instead, NLP models are applied in SLU in a more indirect and auxiliary way, the spoken language is recognized as texts by ASR, and then NLP models is fine-tuned for downstream SLU tasks [16]. Obviously, this method suffers from errors that occur in the ASR process and loses emotion information by dropping the feature of speech models.…”

Section: Pre-trained Neural Language Modelsmentioning

confidence: 99%

“…Basically, two classic methods are proposed for SLU tasks, the two-stage method and the one-stage method [16]. For the two-stage method, a speech model is utilized to transfer speeches to texts, then a language model is applied to extract the results of downstream tasks from the text inputs.…”

Section: Introductionmentioning

confidence: 99%

“…And they do not even contain all the vocabulary and lack the variety of phrase combinations. Besides the shortage of corpora, compared to language models, some researches [16,14] demonstrate that the pre-trained speech models do not learn significant semantic information, as speech models are designed for lower-level tasks, like ASR. These two methods are proved effective, but still, have technical bottlenecks to break.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

Ye¹,

Song²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Historically lower-level tasks such as automatic speech recognition (ASR) and speaker identification are the main focus in the speech field. Interest has been growing in higher-level spoken language understanding (SLU) tasks recently, like sentiment analysis (SA). However, improving performances on SLU tasks remains a big challenge. Basically, there are two main methods for SLU tasks: (1) Two-stage method, which uses a speech model to transfer speech to text, then uses a language model to get the results of downstream tasks; (2) One-stage method, which just fine-tunes a pre-trained speech model to fit in the downstream tasks. The first method loses emotional cues such as intonation, and causes recognition errors during ASR process, and the second one lacks necessary language knowledge. In this paper, we propose the Wave BERT (WaBERT), a novel end-to-end model combining the speech model and the language model for SLU tasks. WaBERT is based on the pre-trained speech and language model, hence training from scratch is not needed. We also set most parameters of WaBERT frozen during training. By introducing WaBERT, audio-specific information and language knowledge are integrated in the short-time and low-resource training process to improve results on the dev dataset of SLUE SA tasks by 1.15% of recall score and 0.82% of F1 score. Additionally, we modify the serial Continuous Integrate-and-Fire (CIF) mechanism to achieve the monotonic alignment between the speech and text modalities.

show abstract

Section: Pre-trained Speech Modelsmentioning

confidence: 99%

Section: Pre-trained Neural Language Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

Ye¹,

Song²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Due to that, JKPs are not able to capture enough information from multimedia news. Promising directions for extracting knowledge from multimedia sources are multimodal machine learning approaches [72] that combine different types of data such as visual and text representations [73,74] and spoken language understanding tasks that analyse and detect audio speech [75]. Another limitation for knowledge extraction is the dark entities (i.e., those entities that do not exist yet in the knowledge base) [76,77].…”

Section: Informationmentioning

confidence: 99%

Supporting Newsrooms with Journalistic Knowledge Graph Platforms: Current State and Future Directions

Ocaña

Opdahl

2022

Technologies

View full text Add to dashboard Cite

Increasing competition and loss of revenues force newsrooms to explore new digital solutions. The new solutions employ artificial intelligence and big data techniques such as machine learning and knowledge graphs to manage and support the knowledge work needed in all stages of news production. The result is an emerging type of intelligent information system we have called the Journalistic Knowledge Platform (JKP). In this paper, we analyse for the first time knowledge graph-based JKPs in research and practice. We focus on their current state, challenges, opportunities and future directions. Our analysis is based on 14 platforms reported in research carried out in collaboration with news organisations and industry partners and our experiences with developing knowledge graph-based JKPs along with an industry partner. We found that: (a) the most central contribution of JKPs so far is to automate metadata annotation and monitoring tasks; (b) they also increasingly contribute to improving background information and content analysis, speeding-up newsroom workflows and providing newsworthy insights; (c) future JKPs need better mechanisms to extract information from textual and multimedia news items; (d) JKPs can provide a digitalisation path towards reduced production costs and improved information quality while adapting the current workflows of newsrooms to new forms of journalism and readers’ demands.

show abstract

HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

Kim,

Chun,

Kim

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

SLUE: New Benchmark Tasks For Spoken Language Understanding Evaluation on Natural Speech

Cited by 30 publications

References 31 publications

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

Supporting Newsrooms with Journalistic Knowledge Graph Platforms: Current State and Future Directions

HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

Contact Info

Product

Resources

About