Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.210
|View full text |Cite
|
Sign up to set email alerts
|

Smoothing and Shrinking the Sparse Seq2Seq Search Space

Abstract: Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences. While this setup has led to strong results in a variety of tasks, one unsatisfying aspect is its length bias: models give high scores to short, inadequate hypotheses and often make the empty string the argmax-the so-called cat got your tongue problem. Recently proposed entmax-based sparse sequence-to-sequence models present a possible solution, since t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 38 publications
0
7
0
Order By: Relevance
“…4 Hyperparameters α and τ . In all experiments we set α = 1.5, because this value was recommended by Peters et al (2019); Peters and Martins (2021) as the middle ground between α = 1 (softmax) and α = 2 (sparsemax).…”
Section: Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…4 Hyperparameters α and τ . In all experiments we set α = 1.5, because this value was recommended by Peters et al (2019); Peters and Martins (2021) as the middle ground between α = 1 (softmax) and α = 2 (sparsemax).…”
Section: Setupmentioning
confidence: 99%
“…We remind the reader that the cat got your tongue problem (Stahlberg and Byrne, 2019) is one of the main motivations for using sparse transformations when generating text. As Peters and Martins (2021) have shown, 1.5-entmax successfully tackles this problem by significantly lowering the proportion of cases where an empty string is more likely than the beam search hypothesis. For 1.5-ReLU, we also calculated this proportion, and compared it with the proportions for softmax and sparsemax (Table 2).…”
Section: Empty Translationsmentioning
confidence: 99%
“…When α = 1, this recovers cross entropy. Entmax-based sparse sequence-to-sequence models have been shown to work well on machine translation Peters and Martins, 2021) as well morphological and phonological (Peters and Martins, 2020) tasks. Beyond the topline results, they have also been shown to be better calibrated than models trained with cross entropy loss (Peters and Martins, 2021).…”
Section: Modelmentioning
confidence: 99%
“…• IWSLT'14 De→En (Cettolo et al) Hyperparameters α and τ . In all experiments we set α = 1.5, because this value was recommended by Peters et al (2019); Peters and Martins (2021a) as the middle ground between α = 1 (softmax) and α = 2 (sparsmax).…”
Section: Setupmentioning
confidence: 99%