2021
DOI: 10.22363/2618-8163-2021-19-3-331-345
|View full text |Cite
|
Sign up to set email alerts
|

Textometr: an online tool for automated complexity level assessment of texts for Russian language learners

Abstract: Evaluation of text accessibility seems to be an extremely urgent and labor-consuming task in the process of preparing texts for teaching Russian as a foreign language. On the other hand, the procedure of assigning a text to one of the levels on the CEFR scale (from A1 to C2) is well-formalized and described in the professional literature, which opens opportunities for its automation. This paper presents Textometr - a new free web-based tool for estimating CEFR level and other key statistics from any given text… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
3

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(8 citation statements)
references
References 7 publications
0
5
0
3
Order By: Relevance
“…One of the main purposes of this dataset was to be a test set for the algorithm of the text complexity assessment for Russian L2 learners. In the previous study we developed the ML system trained on 800 texts from Russian L2 textbooks and a set of linguistic features, including lexical, morphological, grammatical, and syntactic ones (Laposhina et al 2018). The examples of linguistic features are shown in Table 2.…”
Section: The Resulting Data As a Test Setmentioning
confidence: 99%
See 1 more Smart Citation
“…One of the main purposes of this dataset was to be a test set for the algorithm of the text complexity assessment for Russian L2 learners. In the previous study we developed the ML system trained on 800 texts from Russian L2 textbooks and a set of linguistic features, including lexical, morphological, grammatical, and syntactic ones (Laposhina et al 2018). The examples of linguistic features are shown in Table 2.…”
Section: The Resulting Data As a Test Setmentioning
confidence: 99%
“…An automated approach to the complexity assessment of the Russian L2 texts has several examples, most of them are based on datasets with discrete levels, such as the corpus of textbooks annotated by publishers on the CEFR scale can be used (Reynolds 2016;Karpov et al 2014;Batinic et al 2016;Laposhina et al 2018;Corlatescu et al 2022). However, to create a non-discrete scale, expert annotation is necessary, which can be time-consuming and expensive.…”
Section: Datasets For L2 Text Complexity Assessment Taskmentioning
confidence: 99%
“…Text with a readability level 39 % (for grades 3-4) [14] and a list of letters displayed on the screen were used as stimulus material. The children read the text twice: in the first reading, the participants were not tasked with finding the letter; in the second reading, the participants were tasked with finding all the letters D. After the first reading of the text, the participants were told what task they would be asked to do on the second reading of the text.…”
Section: Methodsmentioning
confidence: 99%
“…Среди сервисов анализа русскоязычных текстов можно выделить «Текстометр» и «Простой русский». Первый основан на модели машинного обучения и предполагает анализ лексических, грамматических и синтаксических признаков текста [22]; второй реализован в рамках концепции «простого языка» (plain language) и основан на статистистических расчётах по формулам читабельности, адаптированным для русскоязычных текстов [23]. В качестве примера работы сервисов приведём результаты анализа следующего вопроса к тестовому заданию: Как называется твёрдая фракция, содержащая органоминеральные вещества, выделяемые биоценозом активного ила в процессе его жизнедеятельности при реализации технологии биологической обработки сточных вод?…”
Section: Tableunclassified