IT–IST at the SIGMORPHON 2019 Shared Task: Sparse Two-headed Models for Inflection

Peters, Ben; Martins, André F. T.

doi:10.18653/v1/w19-4207

Cited by 13 publications

(16 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We in fact experimented with this architecture but pre-liminary results on the development sets showed that our two-step architecture achieved better performance. Interestingly, the second-best performing system (Peters and Martins, 2019) at SIG-MORPHON 2019, which also ranked first in terms of Levenshtein distance, also uses decoupled encoders to separately encode the lemma and the tags; this further cosolidates our belief that such an approach is superior to using a single encoder for the concatentated sequence of the tags and lemma. The main difference to our model is that they do not use our two-step decoder process, while they substitute all softmax operations with sparsemax (Martins and Astudillo, 2016), yielding interpretable attention matrices very similar to ours.…”

Section: Related Workmentioning

confidence: 94%

Pushing the Limits of Low-Resource Morphological Inflection

Anastasopoulos

Neubig

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Recent years have seen exceptional strides in the task of automatic morphological inflection generation. However, for a long tail of languages the necessary resources are hard to come by, and state-of-the-art neural methods that work well under higher resource settings perform poorly in the face of a paucity of data. In response, we propose a battery of improvements that greatly improve performance under such low-resource conditions. First, we present a novel two-step attention architecture for the inflection decoder. In addition, we investigate the effects of cross-lingual transfer from single and multiple languages, as well as monolingual data hallucination. The macroaveraged accuracy of our models outperforms the state-of-the-art by 15 percentage points. 1 Also, we identify the crucial factors for success with cross-lingual transfer for morphological inflection: typological similarity and a common representation across languages.

show abstract

Section: Related Workmentioning

confidence: 94%

Pushing the Limits of Low-Resource Morphological Inflection

Anastasopoulos

Neubig

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…We primarily utilize LSTM and Transformers (Vaswani et al, 2017) to construct our models. Additionally we experimented with four techniques Hallucination (Anastasopoulos and Neubig, 2019), Sparse Max-Loss (Peters and Martins, 2019), Language Adversarial Network (Anastasopoulos and Neubig, 2019) (Chen et al, 2019) and Language Vector Injection (Littell et al, 2017).…”

Section: Methodsmentioning

confidence: 99%

“…In the recent years attention based models have gained huge popularity in Natural Language Processing tasks. (Peters and Martins, 2019) introduce a model inspired by sparse sequence to sequence models with a two-headed attention mechanism. The attention and output distributions are computed with Sparsemax function and Sparsemax loss is optimized.…”

Section: Related Workmentioning

confidence: 99%

“…SparseLoss is the loss typically associated with Sparsemax and is known to be computationally very feasible. The incorporation of Sparse-max and Sparse loss in a manner similar to that of (Peters and Martins, 2019) can be seen in Figure 5. In multilingual setting and in particular trying to transfer knowledge between related language(s) and a target language it is sometimes useful to learn language agnostic representations.…”

Section: Sparse-max and Sparse Loss (Sparse)mentioning

confidence: 99%

See 1 more Smart Citation

Exploring Neural Architectures And Techniques For Typologically Diverse Morphological Inflection

Jayarao¹,

Pillay²,

Thombre³

et al. 2020

Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

View full text Add to dashboard Cite

Morphological inflection in low resource languages is critical to augment existing corpora in Low Resource Languages, which can help develop several applications in these languages with very good social impact. We describe our attention-based encoder-decoder approach that we implement using LSTMs and Transformers as the base units. We also describe the ancillary techniques that we experimented with, such as hallucination, language vector injection, sparsemax loss and adversarial language network alongside our approach to select the related language(s) for training. We present the results we generated on the constrained as well as unconstrained SIGMOR-PHON 2020 dataset (Vylomova et al., 2020). One of the primary goals of our paper was to study the contribution varied components described above towards the performance of our system and perform an analysis on the same.

show abstract

“…Nevertheless, this class of models is typically not interpreted, and if it is so, the interpretation is limited to visualizing attention heatmaps on selected examples (see e.g. Aharoni and Goldberg 2017;Peters and Martins 2019). Peters and Martins (2019).…”

Section: Introductionmentioning

confidence: 99%

Interpretability for Morphological Inflection: from Character-level Predictions to Subword-level Rules

Ruzsics¹,

Sozinova²,

Gutierrez-Vasques

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

Neural models for morphological inflection have recently attained very high results. However, their interpretation remains challenging. Towards this goal, we propose a simple linguistically-motivated variant to the encoderdecoder model with attention. In our model, character-level cross-attention mechanism is complemented with a self-attention module over substrings of the input. We design a novel approach for pattern extraction from attention weights to interpret what the model learn. We apply our methodology to analyze the model's decisions on three typologicallydifferent languages and find that a) our pattern extraction method applied to cross-attention weights uncovers variation in form of inflection morphemes, b) pattern extraction from self-attention shows triggers for such variation, c) both types of patterns are closely aligned with grammar inflection classes and class assignment criteria, for all three languages. Additionally, we find that the proposed encoder attention component leads to consistent performance improvements over a strong baseline. Query No. of/Acc Patterns gold target=*scono & MSD=msd it 23/1.00 *re: 9/1.0 (in| z| o| ti| chi| re) *ire: 7/1.0 (s| col| or| ire) *ir: 6/1.0 (re| in| ser| ir| si) *cir*: 1/1.0 (in| fer| o| cir |si) gold target=*ano & MSD=msd it 189/1.00 *are: 149/1.0 (z| am| pic| are) *arsi: 26//1.0 (im| pa| per| arsi) *car*:3/1.0 (ri|mb|ec|car|si) *izzarsi:2/1.0 (dest|abil|izz|arsi) *iarsi:2/1.0 (di|lan|i|arsi) *par*:2/1.0 (dis|col|par|si) *ciarsi:1/1.0 (au|to|den|un|ci|arsi) *mar*:1/1.0 (in|for|mar|si) *rarsi:1/1.0 (gi|ost|r|arsi) *itarsi:1/1.0 (ri|abil|it|arsi) *quar*:1/1.0 (sci|ac|quar|si) gold target=*ono & !(*scono) & MSD=msd it 41/0.95 *ere: 18/0.95 (ri| otten| ere) *dere: 10/1.0 (te| le| ve| dere) *ger*: 3/1.0 (cos| par| ger| si) *re:3/1.0 (servi|re) *e:1/0.0 (ri|ro|m|per|e) *ir*:1/1.0 (1908:s|ent|ir|si) *si:1/1.0 (es|p|or|si) *ire:1/1.0 (ri|di|ven|ire) *er*:1/1.0 (r|aggi|ung|er|si) *mer*:1/1.0 (ass|u|mer|si) Table 6: Italian Self-Att sub Patterns. MSD query msd it is V;IND;PRS;3;PL. Number of examples (No of ) and accuracy (Acc) are shown per selection with query and per group pattern. For each query, we list all extracted lemma patterns (sorted by frequency in a decreasing order) along with one segmented lemma example (in parentheses) mapped to the pattern. Query No. of/Acc Patterns gold target=*koot & MSD=msd fin 37/0.97 *aa:8/1.0 (kar|sa |st |aa) *ua:5/1.0 (ku |or |ett |ua) *id*:5/1.0 (pro |mo |vo |id |a) *a:4/1.0 (pu |r |je |hti |a) *ta:4/1.0 (sk |r |uud |a |ta) *taa:4/1.0 (jo |kel |taa) *illa:2/1.0 (aal |to |illa) *ella:2/0.5 (n |ar |a |hd |ella) *ttaa:1/1.0 (ha |h |mo |ttaa) *sia:1/1.0 (har |sia), *ista:1/1.0 (li |i |pa |ista) gold target=*kööt & MSD=msd fin 9/1.00 *ä:3/1.0 (v |et |ele |hti |ä) *ää:3/1.0 (jä |n |ist |ää) *tä:2/1.0 (kä |pä |tä) *tää:1/1.0 (hy |mä |h |ää)

show abstract

IT–IST at the SIGMORPHON 2019 Shared Task: Sparse Two-headed Models for Inflection

Cited by 13 publications

References 12 publications

Pushing the Limits of Low-Resource Morphological Inflection

Pushing the Limits of Low-Resource Morphological Inflection

Exploring Neural Architectures And Techniques For Typologically Diverse Morphological Inflection

Interpretability for Morphological Inflection: from Character-level Predictions to Subword-level Rules

Contact Info

Product

Resources

About