ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747047
|View full text |Cite
|
Sign up to set email alerts
|

Integration of Pre-Trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
14
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(16 citation statements)
references
References 15 publications
1
14
1
Order By: Relevance
“…We observed a slight but clear gain by increasing K, which improved both ASR and SLU performance thanks to BERT. We note that our result outperforms the state-of-the-art 86.9% reported in (Seo et al, 2022).…”
Section: Resultscontrasting
confidence: 40%
“…We observed a slight but clear gain by increasing K, which improved both ASR and SLU performance thanks to BERT. We note that our result outperforms the state-of-the-art 86.9% reported in (Seo et al, 2022).…”
Section: Resultscontrasting
confidence: 40%
“…Trained on text NLU* (Bastianelli et al, 2020) 84.84 -NLU+ (Seo et al, 2021) 87.73 84.34 BART (Lewis et al, 2020) 88 Trained on text Attention BiRNN (Liu and Lane, 2016) 91.10 94.20 Capsule-NLU (Zhang et al, 2019) 95.00 95.20 LIDSNet (Agarwal et al, 2021) 95.97 -SF-ID Network (E et al, 2019) 96.60 95.60 SyntacticTF 97.31 96.01 BERT SLU 97.50 96.10 Stack-Prop. (Qin et al, 2019) 96.90 95.90 Stack-Prop.…”
Section: Modelsmentioning
confidence: 99%
“…Recently, there is growing interest in building E2E SLU models where the acoustic and textual models are jointly optimized, leading to more robust SLU models [5][6][7][8][9][10][11][12][13][14][15][16]. Early works [5,6] learn an utterance-level semantic representation directly from audio signals without performing speech recognition.…”
Section: Introductionmentioning
confidence: 99%
“…One way to achieve E2E training is to re-frame SLU as a sequence-to-sequence task, where semantic labels are treated as another sequence of output labels besides the transcript [9][10][11][12]. Another way is to unify ASR and NLU models and train them together via differentiable neural interfaces [13][14][15][16]. One commonly used neural interface is to feed the token level hidden representations from ASR as input to the NLU model [13][14][15][16].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation