2015 5th International Conference on Information &Amp; Communication Technology and Accessibility (ICTA) 2015
DOI: 10.1109/icta.2015.7426904
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of the ambiguity caused by the absence of diacritical marks in Arabic texts: Statistical study

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 8 publications
0
4
0
Order By: Relevance
“…Habash [5] argues that on average a non-diacritized word can have 12 different morphological analyses. Similarly, in the statistical study conducted in [4] on a corpus of more than 82 million non-diacritized words, the authors showed that a word analysed out of context has on average 2.63 roots, 5.11 lemmas, 4.79 stems and 3.86 POS tags.…”
Section: Introductionmentioning
confidence: 88%
See 1 more Smart Citation
“…Habash [5] argues that on average a non-diacritized word can have 12 different morphological analyses. Similarly, in the statistical study conducted in [4] on a corpus of more than 82 million non-diacritized words, the authors showed that a word analysed out of context has on average 2.63 roots, 5.11 lemmas, 4.79 stems and 3.86 POS tags.…”
Section: Introductionmentioning
confidence: 88%
“…Ambiguity is very present in the Arabic language because of its agglutinating and derivational characteristics [3]. Moreover, the absence of diacritical marks in the vast majority of Arabic texts greatly amplifies ambiguity [4]. For example, the non-diacritized word " ‫ﻓ‬ ‫ﺮ‬ ‫ﻣ‬ ‫ﺖ‬ " /frmt/ 1 may be the verb " ‫ﻓ‬ َ ‫ﺮ‬ َ ‫ﻣ‬ َ ‫ﺖ‬ ْ " /faramato/ that has two meanings depending on its context: (and she threw) whose root is " ‫ﺭ‬ ‫ﻡ‬ ‫ﻱ‬ " /r m y/, or (she cut) whose root is " ‫ﻑ‬ ‫ﺭ‬ ‫ﻡ‬ " /frm/.…”
Section: Introductionmentioning
confidence: 99%
“…Apart of this, the absence of diacritical marks is a foundation of intricacy for processing Arabic automatic systems language that the sentence meaning cannot be determined easily [15]. Moreover, the shortage of Arabic diacritical marks in the sentences reflects the major reason of the muddle encounter throughout analysis stage [16]. Over and above, the study of [17] demonstrated that the diacritization automatic content increments manual quality labelling of the corpus.…”
Section: Introductionmentioning
confidence: 99%
“…According to [2], in over 77% of cases, a non-vocalized word can have several possible diacritizations and consequently different possible meanings. Table 1 gives an example of this aspect and lists some of the possible diacritization forms of the string ‫"صدق"‬ and the inferred meaning.…”
Section: Introductionmentioning
confidence: 99%