2018
DOI: 10.1145/3178459
|View full text |Cite
|
Sign up to set email alerts
|

Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage

Abstract: We annotate 60,000 words of Classical Arabic (CA) with topics in philosophy, religion, literature, and law with fine-grain segment-based morphological descriptions. We use these annotations for building a morphological segmenter and part-of-speech (POS) tagger for CA. With character-level classification and features from the word and its lexical context, the segmenter achieves a word accuracy of 96.8% with the main issue being a high rate of out-of-vocabulary words. A token-based POS tagger achieves an accurac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(10 citation statements)
references
References 7 publications
0
10
0
Order By: Relevance
“…Our segmentation convention matches with (Aliwy, 2012;Mohamed, 2018;Habash et al, 2012) where clitics are split from words and the of notion of clitics is aligned as the syntactic units that can be assigned a POS tag and can occupy a node on the syntactic tree. It is also similar to the Penn Arabic Treebank (ATB) (Maamouri et al, 2004) with the exception of the definite article where we consider it as a clitic while in the ATB it is taken as a definiteness marker.…”
Section: Comparison Of Segmentation Conventionsmentioning
confidence: 97%
See 2 more Smart Citations
“…Our segmentation convention matches with (Aliwy, 2012;Mohamed, 2018;Habash et al, 2012) where clitics are split from words and the of notion of clitics is aligned as the syntactic units that can be assigned a POS tag and can occupy a node on the syntactic tree. It is also similar to the Penn Arabic Treebank (ATB) (Maamouri et al, 2004) with the exception of the definite article where we consider it as a clitic while in the ATB it is taken as a definiteness marker.…”
Section: Comparison Of Segmentation Conventionsmentioning
confidence: 97%
“…Abdelali et al (2016) developed a segmenter for their tool, Farasa, using SVM and trained on the ATB data with reported accuracy of 98.94%. Moreover, Mohamed (2018) developed a memorybased learning segmenter for Arabic religious texts trained on a manually annotated in-domain corpus of 27k words combined with the ATB data with reported accuracy of 95.70%.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The recent studies of part of speech tagging include different aspects. For instance, [34] developed a part of speech tagger for the Arabic heritage. They scored an accuracy of 96.22%.…”
Section: Literature Reviewmentioning
confidence: 99%
“…However, when we use these tools to segment pre-MSA texts, the quality drops significantly. MADAMIRA [10], the best known Arabic NLP system, which has an accuracy of over 98% on MSA, has an accuracy of 94.7% on a Classical Arabic test set [7]. The main difference between Classical Arabic and MSA lies in their different vocabularies.…”
Section: Introductionmentioning
confidence: 99%