Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1392
|View full text |Cite
|
Sign up to set email alerts
|

Cold Fusion: Training Seq2Seq Models Together with Language Models

Abstract: Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks which involve generating natural language sentences such as machine translation, image captioning and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language model. In this work, we present the Cold Fusion method, which leverages a pre-trained language model during training, and show its effectiveness on the speech recognition task. We show that Seq2Seq models with Cold Fu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
169
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 220 publications
(170 citation statements)
references
References 17 publications
1
169
0
Order By: Relevance
“…These models are usually trained with character-based units and decoded with a basic beam search. There has been extensive efforts to develop decoding algorithms that can use external LMs, so-called fusion methods [27,28,29,30,31]. However, these methods have shown relatively small gains on large-scale ASR tasks [32].…”
Section: Introductionmentioning
confidence: 99%
“…These models are usually trained with character-based units and decoded with a basic beam search. There has been extensive efforts to develop decoding algorithms that can use external LMs, so-called fusion methods [27,28,29,30,31]. However, these methods have shown relatively small gains on large-scale ASR tasks [32].…”
Section: Introductionmentioning
confidence: 99%
“…For this specific task with on a medium-sized corpus, the hybrid approach yields significantly better results. To achieve better performance with the bLSTMs, its output needs to be combined with LM based prefix beam search, or to train the syllable network along with a LM as proposed in [23].…”
Section: Discussionmentioning
confidence: 99%
“…Cold Fusion (Sriram et al, 2017) deals with this problem by training the sequence-to-sequence model along with the gating mechanism, thus making the model aware of the pre-trained language model throughout the training process. The decoder does not need to learn a language model from scratch, and can thus learn more task-specific language characteristics which are not captured by the pre-trained language model (which has been trained on a much larger, domain-agnostic corpus).…”
Section: Fusion Methodsmentioning
confidence: 99%
“…The output of the DM is similarly concatenated to the input of the linear layer between the encoder and the decoder of the higher-level model. The output of the NLG, in the form of logits at a decoding time-step, is combined with the hidden state of the decoder via cold-fusion (Sriram et al, 2017). Given the NLG output as l N LG t and the higher-level decoder hidden state as s t , the cold-fusion method is described as follows:…”
Section: Structured Fusion Networkmentioning
confidence: 99%