Proceedings of the First Workshop on Subword and Character Level Models in NLP 2017
DOI: 10.18653/v1/w17-4104
|View full text |Cite
|
Sign up to set email alerts
|

Automated Word Stress Detection in Russian

Abstract: In this study we address the problem of automated word stress detection in Russian using character level models and no partspeech-taggers. We use a simple bidirectional RNN with LSTM nodes and achieve the accuracy of 90% or higher. We experiment with two training datasets and show that using the data from an annotated corpus is much more efficient than using a dictionary, since it allows us to take into account word frequencies and the morphological context of the word.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 5 publications
0
5
0
Order By: Relevance
“…In this project, we approach the word stress detection problem for three East Slavic languages: Russian, Ukrainian and Belarusian, which are said to be mutually intelligible to some extent. Our preliminary experiments along with the results of (Ponomareva et al, 2017) show that using context, i.e., left and right words to the word under consideration, is of great help. Hence, such data sources as dictionaries, including Wiktionary, do not satisfy these requirements, because they provide only single words and do not provide context words.…”
Section: Datasetmentioning
confidence: 90%
See 3 more Smart Citations
“…In this project, we approach the word stress detection problem for three East Slavic languages: Russian, Ukrainian and Belarusian, which are said to be mutually intelligible to some extent. Our preliminary experiments along with the results of (Ponomareva et al, 2017) show that using context, i.e., left and right words to the word under consideration, is of great help. Hence, such data sources as dictionaries, including Wiktionary, do not satisfy these requirements, because they provide only single words and do not provide context words.…”
Section: Datasetmentioning
confidence: 90%
“…The global model, which is shown to be the best RNN-based architecture for this setting of the task, was first presented in (Ponomareva et al, 2017), where a simple bidirectional RNN with LSTM nodes was used to achieve the accuracy of 90% or higher. The authors experiment with two training datasets and show that using the data from an annotated corpus is much more efficient than using a dictionary since it allows to consider word frequencies and the morphological context of the word.…”
Section: Word Stress Detection In East Slavic Languagesmentioning
confidence: 99%
See 2 more Smart Citations
“…The annotation was performed by Russian-language speakers on the crowdsourcing platform Yandex. Toloka, which had already been used in several academic studies about Russian-language texts [10], [29], [32], [44]. As an annotation guideline, we used the annotation instructions for toxicity with sub-attributes from Jigsaw Toxic Comment Classification Challenge.…”
Section: Toxic Comments Datasetmentioning
confidence: 99%