Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology 2019
DOI: 10.18653/v1/w19-4207
|View full text |Cite
|
Sign up to set email alerts
|

IT–IST at the SIGMORPHON 2019 Shared Task: Sparse Two-headed Models for Inflection

Abstract: This paper presents the Instituto de Telecomunicações-Instituto Superior Técnico submission to Task 1 of the SIGMORPHON 2019 Shared Task. Our models combine sparse sequence-to-sequence models with a two-headed attention mechanism that learns separate attention distributions for the lemma and inflectional tags. Among submissions to Task 1, our models rank second and third. Despite the low data setting of the task (only 100 in-language training examples), they learn plausible inflection patterns and often concen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
15
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(16 citation statements)
references
References 12 publications
1
15
0
Order By: Relevance
“…We in fact experimented with this architecture but pre-liminary results on the development sets showed that our two-step architecture achieved better performance. Interestingly, the second-best performing system (Peters and Martins, 2019) at SIG-MORPHON 2019, which also ranked first in terms of Levenshtein distance, also uses decoupled encoders to separately encode the lemma and the tags; this further cosolidates our belief that such an approach is superior to using a single encoder for the concatentated sequence of the tags and lemma. The main difference to our model is that they do not use our two-step decoder process, while they substitute all softmax operations with sparsemax (Martins and Astudillo, 2016), yielding interpretable attention matrices very similar to ours.…”
Section: Related Workmentioning
confidence: 94%
“…We in fact experimented with this architecture but pre-liminary results on the development sets showed that our two-step architecture achieved better performance. Interestingly, the second-best performing system (Peters and Martins, 2019) at SIG-MORPHON 2019, which also ranked first in terms of Levenshtein distance, also uses decoupled encoders to separately encode the lemma and the tags; this further cosolidates our belief that such an approach is superior to using a single encoder for the concatentated sequence of the tags and lemma. The main difference to our model is that they do not use our two-step decoder process, while they substitute all softmax operations with sparsemax (Martins and Astudillo, 2016), yielding interpretable attention matrices very similar to ours.…”
Section: Related Workmentioning
confidence: 94%
“…We primarily utilize LSTM and Transformers (Vaswani et al, 2017) to construct our models. Additionally we experimented with four techniques Hallucination (Anastasopoulos and Neubig, 2019), Sparse Max-Loss (Peters and Martins, 2019), Language Adversarial Network (Anastasopoulos and Neubig, 2019) (Chen et al, 2019) and Language Vector Injection (Littell et al, 2017).…”
Section: Methodsmentioning
confidence: 99%
“…In the recent years attention based models have gained huge popularity in Natural Language Processing tasks. (Peters and Martins, 2019) introduce a model inspired by sparse sequence to sequence models with a two-headed attention mechanism. The attention and output distributions are computed with Sparsemax function and Sparsemax loss is optimized.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Nevertheless, this class of models is typically not interpreted, and if it is so, the interpretation is limited to visualizing attention heatmaps on selected examples (see e.g. Aharoni and Goldberg 2017;Peters and Martins 2019). Peters and Martins (2019).…”
Section: Introductionmentioning
confidence: 99%