Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2089
|View full text |Cite
|
Sign up to set email alerts
|

Addressing Noise in Multidialectal Word Embeddings

Abstract: Word embeddings are crucial to many natural language processing tasks. The quality of embeddings relies on large nonnoisy corpora. Arabic dialects lack large corpora and are noisy, being linguistically disparate with no standardized spelling. We make three contributions to address this noise. First, we describe simple but effective adaptations to word embedding tools to maximize the informative content leveraged in each training sentence. Second, we analyze methods for representing disparate dialects in one em… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 25 publications
0
9
0
Order By: Relevance
“…(1) Four Arabic dialects are considered: MAG, EGY, GLF, and LEV. We evaluate our method using off-the-shelf word embedding, pretrained by [3]. The word vectors are the concatenation of separately trained wide and narrow windowed FastText embedding models of dimension 200 [10].…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…(1) Four Arabic dialects are considered: MAG, EGY, GLF, and LEV. We evaluate our method using off-the-shelf word embedding, pretrained by [3]. The word vectors are the concatenation of separately trained wide and narrow windowed FastText embedding models of dimension 200 [10].…”
Section: Methodsmentioning
confidence: 99%
“…The wide context window is set to 5, while the narrow context window is fixed to 1. The aim is to capture both syntactic and semantic information of words [3]. The resulting embeddings are 400-dimensional vectors.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations