Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) 2021
DOI: 10.18653/v1/2021.semeval-1.99
|View full text |Cite
|
Sign up to set email alerts
|

LU-BZU at SemEval-2021 Task 2: Word2Vec and Lemma2Vec performance in Arabic Word-in-Context disambiguation

Abstract: This paper presents a set of experiments to evaluate and compare between the performance of using CBOW Word2Vec and Lemma2Vec models for Arabic Word-in-Context (WiC) disambiguation without using sense inventories or sense embeddings. As part of the SemEval-2021 Shared Task 2 on WiC disambiguation, we used the dev.ar-ar dataset (2k sentence pairs) to decide whether two words in a given sentence pair carry the same meaning. We used two Word2Vec models: Wiki-CBOW, a pre-trained model on Arabic Wikipedia, and anot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 21 publications
0
7
0
Order By: Relevance
“…We plan to increase the size of our corpus to cover additional Levantine sub-dialects, especially those of other Levantine areas, most notably some of Syria's dialectal varieties. We also plan to use this corpus to develop morphological analyzers and word-sense disambiguation system for Levantine Arabic as we did for MSA (see (Al-Hajj and Jarrar, 2021a;Al-Hajj and Jarrar, 2021b)). Additionally, we plan to build on the Palestinian and Lebanese dialect lemmas to develop a Levantine-MSA-English Lexicon and extend it with synonyms (Jarrar et al, 2021).…”
Section: Discussionmentioning
confidence: 99%
“…We plan to increase the size of our corpus to cover additional Levantine sub-dialects, especially those of other Levantine areas, most notably some of Syria's dialectal varieties. We also plan to use this corpus to develop morphological analyzers and word-sense disambiguation system for Levantine Arabic as we did for MSA (see (Al-Hajj and Jarrar, 2021a;Al-Hajj and Jarrar, 2021b)). Additionally, we plan to build on the Palestinian and Lebanese dialect lemmas to develop a Levantine-MSA-English Lexicon and extend it with synonyms (Jarrar et al, 2021).…”
Section: Discussionmentioning
confidence: 99%
“…A RESTful web service for Arabic NER is developed and deployed online 4 as part of our language understanding resources Al-Hajj and Jarrar, 2021b;Jarrar et al, 2021). The web service takes a text as input and returns the output in three different formats: (i) JSON IOB2, a JSON in which each token in the input text is returned with its corresponding tag similar to the IOB2 scheme, (ii) JSON entities, only the recognized named entities and their positions are returned, and (iii) XML, which is similar to the format (ii), but the named entities are marked up using XML.…”
Section: Methodsmentioning
confidence: 99%
“…We train the Word2Vec model using Indonesian Wikipedia dataset because it is a large dataset that is publicly available. Using the same reason, there are also many previous researchers who also learned their Word2Vec models using Wikipedia dataset [39], [40].…”
Section: Word Embeddingmentioning
confidence: 99%