Rahma Sellami scite author profile

Rahma Sellami

5Publications

21Citation Statements Received

8Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Sfax, University of Gabès

Publications

Order By: Most citations

Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Application – the case of Tunisian Arabic and the Social Media

Sadat¹,

Mallek²,

Boudabous³

et al. 2014

View full text Add to dashboard Cite

Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for the translation of texts of social media. More precisely, this paper focuses on the Tunisian Dialect of Arabic (TAD) with an application on automatic machine translation for a social media text into MSA and any other target language. Linguistic tools such as a bilingual TAD-MSA lexicon and a set of grammatical mapping rules are collaboratively constructed and exploited in addition to a language model to produce MSA sentences of Tunisian dialectal sentences. This work is a first-step towards collaboratively constructed semantic and lexical resources for Arabic Social Media within the ASMAT (Arabic Social Media Analysis Tools) project.

show abstract

Automatic Diacritics Restoration for Tunisian Dialect

Masmoudi

Mdhaffar

Sellami

et al. 2019

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

Modern Standard Arabic, as well as Arabic dialect languages, are usually written without diacritics. The absence of these marks constitute a real problem in the automatic processing of these data by NLP tools. Indeed, writing Arabic without diacritics introduces several types of ambiguity. First, a word without diacratics could have many possible meanings depending on their diacritization. Second, undiacritized surface forms of an Arabic word might have as many as 200 readings depending on the complexity of its morphology [12]. In fact, the agglutination property of Arabic might produce a problem that can only be resolved using diacritics. Third, without diacritics a word could have many possible parts of speech (POS) instead of one. This is the case with the words that have the same spelling and POS tag but a different lexical sense, or words that have the same spelling but different POS tags and lexical senses [8]. Finally, there is ambiguity at the grammatical level (syntactic ambiguity). In this article, we propose the first work that investigates the automatic diacritization of Tunisian Dialect texts. We first describe our annotation guidelines and procedure. Then, we propose two major models, namely a statistical machine translation (SMT) and a discriminative model as a sequence classification task based on Conditional Random Fields (CRF). In the second approach, we integrate POS features to influence the generation of diacritics. Diacritics restoration was performed at both the word and the character levels. The results showed high scores of automatic diacritization based on the CRF system (Word Error Rate (WER) 21.44% for CRF and WER 34.6% for SMT).

show abstract

Improved Statistical Machine Translation by Cross-Linguistic Projection of Named Entities Recognition and Translation

Sellami

Deffaf

Sadat

et al. 2015

CyS

View full text Add to dashboard Cite

One of the existing difficulties in natural language processing applications is the lack of appropriate tools for the recognition, translation, and/or transliteration of named entities (NEs), specifically for lessresourced languages. In this paper, we propose a new method to automatically label multilingual parallel data for Arabic-French pair of languages with named entity tags and build lexicons of those named entities with their transliteration and/or translation in the target language. For this purpose, we bring in a third well-resourced language, English, that might serve as pivot, in order to build an Arabic-French NE Translation lexicon. Evaluations on the Arabic-French pair of languages using English as pivot in the transitive model showed the effectiveness of the proposed method for mining Arabic-French named entities and their translations. Moreover, the integration of this component in statistical machine translation outperformed the baseline system.

show abstract

Building and Exploiting Domain-Specific Comparable Corpora for Statistical Machine Translation

Sellami

Sadat

Beluith

2017

View full text Add to dashboard Cite

Towards A Conceptual Model for Citizen's Adoption of E-Government Services in Developing Countries

Almufti

Sellami

Belguith

2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rahma Sellami

Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Application – the case of Tunisian Arabic and the Social Media

Automatic Diacritics Restoration for Tunisian Dialect

Improved Statistical Machine Translation by Cross-Linguistic Projection of Named Entities Recognition and Translation

Building and Exploiting Domain-Specific Comparable Corpora for Statistical Machine Translation

Towards A Conceptual Model for Citizen's Adoption of E-Government Services in Developing Countries

Contact Info

Product

Resources

About