Topical Text Classification of Russian News: a Comparison of BERT and Standard Models

Lagutina, Ksenia

doi:10.23919/fruct54823.2022.9770920

Cited by 4 publications

(1 citation statement)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, the texts of the corpus were tokenized and lemmatized using the Stanza library (Qi et al, 2020) for the Python 3.7 programming language 3 . We chose this library because it showed good results in processing both structured and unstructured text data of various genres in Russian (Lagutina, 2022;Mamaev et al, 2023). Secondly, on the basis of the Russian National Corpus 4 and a Frequency Dictionary of Russian (Lyashevskaya & Sharov, 2009), a list of stop-words was compiled to exclude lexical units that do not contain an important semantic component: prepositions, conjunctions, auxiliary words.…”

Section: Processing Toolsmentioning

confidence: 99%

Lessons of Secondary School Teachers: From Automatic Speech Analysis to the Markers of Effective Teaching Practices

Mamaev,

Khokhlova,

Dayter

2024

E&SD

View full text Add to dashboard Cite

The problem of pedagogical discourse as a speech behavior form is a cutting-edge linguistic area. Within its framework, it is necessary to identify some lexical and semantic components that form a certain rhetorical and pedagogical ideal. To date, such studies are carried out manually. This paper describes the automatic study of pedagogical discourse. As part of the experiment, statistically significant discourse markers and patterns are extracted from the corpus of teachers’ speeches, such markers characterizing both general trends in teaching methods and idiostylistic characteristics of a particular teacher. The results of the marker analysis make it possible to form a preliminary list of speech patterns that beginner teachers can use.

show abstract