A hybrid approach for Arabic lemmatization

Boudchiche, Mohamed; Mazroui, Azzeddine

doi:10.1007/s10772-018-9528-3

Cited by 15 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It remains to be noted that we did not consider it necessary to make comparisons with other disambiguation systems such as Madamira or Farasa since in [30] the authors compared Madamira with their lemmatization system based on HMMs, which is equivalent to the one used in Table 4, and they showed the superiority of the performances of their system.…”

Section: Resultsmentioning

confidence: 99%

Spline functions for Arabic morphological disambiguation

Boudchiche

Mazroui

2020

ACI

Self Cite

View full text Add to dashboard Cite

We have developed in this paper a morphological disambiguation hybrid system for the Arabic language that identifies the stem, lemma and root of a given sentence words. Following an out-of-context analysis performed by the morphological analyser Alkhalil Morpho Sys, the system first identifies all the potential tags of each word of the sentence. Then, a disambiguation phase is carried out to choose for each word the right solution among those obtained during the first phase. This problem has been solved by equating the disambiguation issue with a surface optimization problem of spline functions. Tests have shown the interest of this approach and the superiority of its performances compared to those of the state of the art.

show abstract

Section: Resultsmentioning

confidence: 99%

Spline functions for Arabic morphological disambiguation

Boudchiche

Mazroui

2020

ACI

Self Cite

View full text Add to dashboard Cite

show abstract

“…LANS dataset does not store the data in the lemmatized format, because lemmatization is usually used in the training or testing on the original data. Many lemmatizers are considered such as Alkhalil (Boudchiche and Mazroui, 2019), ISRI (Khoja) (El-Defrawy et al, 2015), Madamira (Pasha et al, 2014), CAMeL (Obeid et al, 2020), but only Farasa (Mubarak, 2017;Abdelali et al, 2016) is applied because it outperforms the state-of-the-art CAMel by a slight margin and its fast performance on large-scale datasets. Following all the mentioned steps, the dataset is passed for automatic evaluation (see sec 6).…”

Section: Preprocessingmentioning

confidence: 99%

LANS: Large-scale Arabic News Summarization Corpus

Alhamadani,

Zhang,

et al. 2023

Proceedings of ArabicNLP 2023

View full text Add to dashboard Cite

Text summarization has been intensively studied in many languages, and some languages have reached advanced stages. Yet, Arabic Text Summarization (ATS) is still in its developing stages. Existing ATS datasets are either small or lack diversity. We build, LANS, a largescale and diverse dataset for Arabic Text Summarization task. LANS offers 8.4 million articles and their summaries extracted from newspapers websites' metadata between 1999 and 2019. The high-quality and diverse summaries are written by journalists from 22 major Arab newspapers, and include an eclectic mix of at least more than 7 topics from each source. We conduct an intrinsic evaluation on LANS by both automatic and human evaluations. Human evaluation of 1,000 random samples reports 95.4% accuracy for our collected summaries, and automatic evaluation quantifies the diversity and abstractness of the summaries.

show abstract

“…Yang and Mao (2016) used word embedding to integrate knowledge. Boudchiche and Mazroui (2018) created an lemmatization, including two modules. They adopted hidden Markov models and validated this approach using a labeled corpus consisting of about 500,000 words.…”

Section: Examples Of How To Pick a Good American President From 2016 ...mentioning

confidence: 99%

Longitudinal Study of a Website for Assessing American Presidential Candidates and Decision Making of Potential Election Irregularities Detection

Piper

Rodger

2022

International Journal on Semantic Web and Information Systems

View full text Add to dashboard Cite

We employ the concept of word sense disambiguation to determine the inherent meaning of voter intentions regarding possible political candidates from the 2016 Presidential election. We present our findings based on a website (www.presidentselect.com) that we developed, where candidates can be examined and their true assets and competencies in three major areas of eligibility, education, and experience inputs can be deciphered. Data envelope analysis is used to determine underlying word instances for elected and successful outputs. We also utilize our web site results to longitudinally extend these findings for decision making of potential election fraud detection in the 2020 Presidential election, utilizing Benford’s Law. Our results shed light on these phenomenon and provide new insights into the word sense disambiguation literature.

show abstract

A hybrid approach for Arabic lemmatization

Cited by 15 publications

References 15 publications

Spline functions for Arabic morphological disambiguation

Spline functions for Arabic morphological disambiguation

LANS: Large-scale Arabic News Summarization Corpus

Longitudinal Study of a Website for Assessing American Presidential Candidates and Decision Making of Potential Election Irregularities Detection

Contact Info

Product

Resources

About