Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.151
|View full text |Cite
|
Sign up to set email alerts
|

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks

Abstract: End-to-end approaches for sequence tasks are becoming increasingly popular. Yet for complex sequence tasks, like speech translation, systems that cascade several models trained on sub-tasks have shown to be superior, suggesting that the compositionality of cascaded systems simplifies learning and enables sophisticated search capabilities. In this work, we present an end-to-end framework that exploits compositionality to learn searchable hidden representations at intermediate stages of a sequence model using de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

3
6

Authors

Journals

citations
Cited by 19 publications
(9 citation statements)
references
References 58 publications
0
9
0
Order By: Relevance
“…On the NAR side, they use CMLM and a CTC-based model as NAR decoders, denoted as Orthros-CMLM and Orthros-CTC, respectively. Such muti-decoder is also widely used for speech translation [141], [142], which is a two-pass decoding method that decomposes the overall task into two sub-tasks, i.e., ASR and machine translation. Inaguma et al [143] propose Fast-MD, where the hidden intermediates are generated in a non-autoregressive manner by a Mask-CTC model.…”
Section: Speech Translationmentioning
confidence: 99%
“…On the NAR side, they use CMLM and a CTC-based model as NAR decoders, denoted as Orthros-CMLM and Orthros-CTC, respectively. Such muti-decoder is also widely used for speech translation [141], [142], which is a two-pass decoding method that decomposes the overall task into two sub-tasks, i.e., ASR and machine translation. Inaguma et al [143] propose Fast-MD, where the hidden intermediates are generated in a non-autoregressive manner by a Mask-CTC model.…”
Section: Speech Translationmentioning
confidence: 99%
“…The performance of the downstream NLP task is directly affected by the recognition errors [31]. Previous studies improved the robustness of the NLP back-end to ASR errors using various auxiliary information sources from the ASR system, e.g., probabilities, recognition hypotheses, and hidden states [4,6,8,10,11,[32][33][34][35][36][37].…”
Section: Conventional Interconnection Of Asr and Ts Systemsmentioning
confidence: 99%
“…This year we focused on (1) sequence-level knowledge distillation (SeqKD) (Kim and Rush, 2016), (2) Conformer encoder (Gulati et al, 2020), (3) Multi-Decoder architecture (Dalmia et al, 2021), (4) model ensembling, and (5) better segmentation with a neural network-based voice activity (VAD) system (Bredin et al, 2020) and a novel algorithm to merge multiple short segments for long context modeling. Our primary focus was E2E models, although we also compared them with cascade systems with our best effort.…”
Section: Introductionmentioning
confidence: 99%