Interspeech 2013 2013
DOI: 10.21437/interspeech.2013-373
|View full text |Cite
|
Sign up to set email alerts
|

Diacritics restoration for Arabic dialect texts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 7 publications
0
9
0
Order By: Relevance
“…The phrase-based Statistical Machine Translation (SMT) system has been successfully applied to restore diacritics in the Algiers dialectal texts of the Arabic language [22]. This system uses the Moses (Open Source Toolkit for SMT) engine with the default settings, such as the bidirectional phrase and lexical translation probabilities, the distortion model with seven features, a word and phrase penalty, and a language model.…”
Section: Translation-based Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…The phrase-based Statistical Machine Translation (SMT) system has been successfully applied to restore diacritics in the Algiers dialectal texts of the Arabic language [22]. This system uses the Moses (Open Source Toolkit for SMT) engine with the default settings, such as the bidirectional phrase and lexical translation probabilities, the distortion model with seven features, a word and phrase penalty, and a language model.…”
Section: Translation-based Approachesmentioning
confidence: 99%
“…The SMT-based method was also applied to Hungarian texts [23]. Similar to [22], Moses was used with the default configuration settings (except for the translation model that contained only unigrams, and the language model with n up to 5), monotone decoding, and without the alignment step. However, SMT alone was not enough to solve their task: the agglutinative morphology of the Hungarian language results in plenty of word forms that are unseen by the system with the restricted vocabulary.…”
Section: Translation-based Approachesmentioning
confidence: 99%
“…The phrase-based Statistical Machine Translation (SMT) system is successfully applied to restore diacritics in the Algiers dialectal texts of the Arabic language [22]. This system uses the Moses (Open Source Toolkit for SMT) engine with the default settings: the bidirectional phrase and lexical translation probabilities; the distortion model with seven features; a word and phrase penalty; and a language model.…”
Section: Translation-based Approachesmentioning
confidence: 99%
“…The SMT-based method was also applied to Hungarian texts [23]. Similar to [22], Moses was used with the default configuration settings (except for the translation model that contained only unigrams, and the language model with n up to 5), monotone decoding, and without the alignment step. However, SMT alone is not enough for their solving task: agglutinative morphology of the Hungarian language results in plenty of word forms unseen for the system with the restricted vocabulary.…”
Section: Translation-based Approachesmentioning
confidence: 99%

Correcting diacritics and typos with a ByT5 transformer model

Stankevičius,
Lukoševičius,
Kapočiūtė-Dzikienė
et al. 2022
Preprint
“…Thus, a consideration of context is required for proper disambiguation. Due to the inter-word dependence of CEs, they are typically harder to predict compared to core-word diacritics (Habash and Rambow 2007, Roth et al 2008, Harrat et al 2013, Ameur et al 2015, with CEER of state-of-the-art systems being in double digits compared to nearly 3% for word-cores. Since recovering CEs is akin to shallow parsing (Marton et al 2010) and requires morphological and syntactic processing, it is a difficult problem in Arabic NLP.…”
Section: Introductionmentioning
confidence: 99%