Don't Throw Those Morphological Analyzers Away Just Yet: Neural
            Morphological Disambiguation for Arabic

Zalmout, Nasser; Habash, Nizar

doi:10.18653/v1/d17-1073

Cited by 46 publications

(76 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other research, however, argue that enriching embeddings with additional morphological information boosts performance. [13] demonstrates this by using the results of a morphological analyzer to further improve candidate ranking in a morphological disambiguation task for Arabic. In a research for Burmese word segmentation, [14] address the problem by employing binary classification with classifiers such as CRFs.…”

Section: Related Workmentioning

confidence: 98%

Morphological Segmentation with LSTM Neural Networks for Tigrinya

Tedla¹,

Yamamoto²

2018

IJNLC

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 98%

Morphological Segmentation with LSTM Neural Networks for Tigrinya

Tedla¹,

Yamamoto²

2018

IJNLC

View full text Add to dashboard Cite

show abstract

“…We also use weighted matching; where instead of assigning ones and zeros for the matched/mismatched features, we use a featurespecific matching weight. We replicate the morphological disambiguation pipeline presented in earlier contributions (Zalmout and Habash, 2017;, and use the same parameter values and feature weights.…”

Section: Full Morphological Disambiguationmentioning

confidence: 99%

“…BLIND TEST FULL FEATS DIAC LEX POS FULL FEATS DIAC LEX POS MADAMIRAMSA (Pasha et al, 2014) 85. (Zalmout and Habash, 2017) 90 77 Embedding Models Joint embedding spaces between the dialects, whether through embedding space mapping or through learning the embeddings on the combined corpus, did not perform well. Using separate embedding models (whether for word or character embeddings) for each dialect shows better accuracy.…”

Section: Dev Testmentioning

confidence: 99%

Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling

Zalmout

Habash

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Morphological tagging is challenging for morphologically rich languages due to the large target space and the need for more training data to minimize model sparsity. Dialectal variants of morphologically rich languages suffer more as they tend to be more noisy and have less resources. In this paper we explore the use of multitask learning and adversarial training to address morphological richness and dialectal variations in the context of full morphological tagging. We use multitask learning for joint morphological modeling for the features within two dialects, and as a knowledge-transfer scheme for crossdialectal modeling. We use adversarial training to learn dialect invariant features that can help the knowledge-transfer scheme from the high to low-resource variants. We work with two dialectal variants: Modern Standard Arabic (high-resource "dialect" 1 ) and Egyptian Arabic (low-resource dialect) as a case study. Our models achieve state-of-the-art results for both. Furthermore, adversarial training provides more significant improvement when using smaller training datasets in particular.

show abstract

“…Arabic diacritization, which can be considered forms of text normalization, has received a number of neural efforts (Belinkov and Glass, 2015;Abandah et al, 2015). However, state-of-the-art approaches for end-to-end text normalization rely on several additional models and rule-based approaches as hybrid models (Pasha et al, 2014;Nawar, 2015;Zalmout and Habash, 2017), which introduce direct human knowledge into the system, but are limited to correcting specific mistakes and rely on expert knowledge to be developed.…”

Section: Related Workmentioning

confidence: 99%

Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models

Watson¹,

Zalmout²,

Habash³

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Text normalization is an important enabling technology for several NLP tasks. Recently, neural-network-based approaches have outperformed well-established models in this task. However, in languages other than English, there has been little exploration in this direction. Both the scarcity of annotated data and the complexity of the language increase the difficulty of the problem. To address these challenges, we use a sequence-to-sequence model with character-based attention, which in addition to its self-learned character embeddings, uses word embeddings pre-trained with an approach that also models subword information. This provides the neural model with access to more linguistic information especially suitable for text normalization, without large parallel corpora. We show that providing the model with word-level features bridges the gap for the neural network approach to achieve a state-of-the-art F 1 score on a standard Arabic language correction shared task dataset.

show abstract

Don't Throw Those Morphological Analyzers Away Just Yet: Neural Morphological Disambiguation for Arabic

Cited by 46 publications

References 22 publications

Morphological Segmentation with LSTM Neural Networks for Tigrinya

Morphological Segmentation with LSTM Neural Networks for Tigrinya

Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling

Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models

Contact Info

Product

Resources

About