Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding

Dinarelli, Marco; Vukotić, Vedran; Raymond, Christian

doi:10.21437/interspeech.2017-1480

Cited by 11 publications

(50 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For example, the missing words in the pre-trained French word embedding adversely affected the F1 scores for MEDIA. The approach can be easily adapted to a variety of different network architectures (e.g., (Dinarelli et al, 2017)) and word embeddings (e.g., (Reimers and Gurevych, 2017a)). Future studies will focus on how to choose a good set of concepts for the PC priming strategy.…”

Section: Discussionmentioning

confidence: 99%

Attention-based Semantic Priming for Slot-filling

Wu¹,

Banchs²,

D’Haro³

et al. 2018

Proceedings of the Seventh Named Entities Workshop

View full text Add to dashboard Cite

The problem of sequence labelling in language understanding would benefit from approaches inspired by semantic priming phenomena. We propose that an attentionbased RNN architecture can be used to simulate semantic priming for sequence labelling. Specifically, we employ pretrained word embeddings to characterize the semantic relationship between utterances and labels. We validate the approach using varying sizes of the ATIS and MEDIA datasets, and show up to 1.4-1.9% improvement in F1 score. The developed framework can enable more explainable and generalizable spoken language understanding systems.

show abstract

Section: Discussionmentioning

confidence: 99%

Attention-based Semantic Priming for Slot-filling

Wu¹,

Banchs²,

D’Haro³

et al. 2018

Proceedings of the Seventh Named Entities Workshop

View full text Add to dashboard Cite

show abstract

“…For both ATIS and MEDIA, entities are used as the utterance input. In contrast to [3], no context windows were used as part of the inputs in our models. Instead, contextual information has been exploited at different stages by our models, as described in Section 2.…”

Section: Datasetsmentioning

confidence: 99%

“…Note again that our word and label embeddings have 200 dimensions in both ATIS and MEDIA, while [3] used 100 and 200 dimensions for ATIS and ME-DIA, respectively. Even with much fewer dimensions, the Jordan network based model in [3] still requires more than 1.7 million parameters, while, in comparison, our model needs only 682,000 parameters and can achieve comparable performance. There are at least two reasons why our approach requires far fewer parameters.…”

Section: Blmentioning

confidence: 99%

“…In contrast, [11] simply uses all labels as input during test time to take advantage of label embeddings. Another approach, instead of modifying the input, develops special architectures to make use of label embeddings, similar to neural machine translation models [8,3], where a predicted label is used for the subsequent prediction.…”

Section: Introductionmentioning

confidence: 99%

“…Second, the proposed architecture may be too complex to be adapted for well-known paradigms, such as the straightforward RNN+CRF architecture for sequence-to-sequence learning. As an example, the proposed model in [3] relies on the previously predicted label as contextual information to predict the next label. To adopt the idea in [3], the simpler RNN+CRF model requires non-trivial modifications.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Joint Learning of Word and Label Embeddings for Sequence Labelling in Spoken Language Understanding

D’Haro

Chen

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

We propose an architecture to jointly learn word and label embeddings for slot filling in spoken language understanding. The proposed approach encodes labels using a combination of word embeddings and straightforward word-label association from the training data. Compared to the state-ofthe-art methods, our approach does not require label embeddings as part of the input and therefore lends itself nicely to a wide range of model architectures. In addition, our architecture computes contextual distances between words and labels to avoid adding contextual windows, thus reducing memory footprint. We validate the approach on established spoken dialogue datasets and show that it can achieve state-of-the-art performance with much fewer trainable parameters.

show abstract