2019 IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP) 2019
DOI: 10.1109/iccp48234.2019.8959557
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning for Automatic Diacritics Restoration in Romanian

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 3 publications
0
7
0
Order By: Relevance
“…Such a seq2seq approach, with the RNN-based core, was successfully applied to the Turkish language [43], and, with the LSTM-based core, to Vietnamese texts [5,44]. In [45], Romanian authors investigated four different encoder-decoder architectures operating on the character level: one-layer LSTMs, two types of stacked LSTMs, and the CNN-based method (three-layer CNN with the concatenated output of the encoder and decoder, processed with another two-layer CNN), and determined that the CNN-based approach was the most accurate. Moreover, they compared their seq2seq approaches with the classification-based approach.…”
Section: Deep-learning-based Approachesmentioning
confidence: 99%
“…Such a seq2seq approach, with the RNN-based core, was successfully applied to the Turkish language [43], and, with the LSTM-based core, to Vietnamese texts [5,44]. In [45], Romanian authors investigated four different encoder-decoder architectures operating on the character level: one-layer LSTMs, two types of stacked LSTMs, and the CNN-based method (three-layer CNN with the concatenated output of the encoder and decoder, processed with another two-layer CNN), and determined that the CNN-based approach was the most accurate. Moreover, they compared their seq2seq approaches with the classification-based approach.…”
Section: Deep-learning-based Approachesmentioning
confidence: 99%
“…However, when we used random splits for training, validation, and testing (see Section VI), the proposed method produced a better WER (8.43%) and an even better DER (2.13%). In the second comparison approach, we compared the resulting models (see Section VI) with three well-known and publicly available Arabic diacritization systems, namely, MADAMIRA 3 (morphological analysis-based), Farasa 4 (feature engineering-based), and the Belinkov and Glass model 5 (data-based). We based our comparison on three manually diacritized texts from three different genres with various writing styles and sentence structures.…”
Section: Comparisonmentioning
confidence: 99%
“…Diacritics are marks written above or below words or letters in several languages such as Arabic [1], Turkish [2], and Romanian [3]. Arabic texts are usually written without diacritics, and readers can infer the meanings and correct pronunciations of the words from their contexts.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In [27], an improvement is developed where a morphological and a syntactical analyzer are used to accelerate the predictions. Since Convolutional Neural Networks (CNNs) can run in parallel, they may be an alternative for RNNs when the speed of diacritization is a major issue [28,29]. They usually lead to a loss in the accuracy.…”
Section: Introductionmentioning
confidence: 99%