“…First, the texts of the corpus were tokenized and lemmatized using the Stanza library (Qi et al, 2020) for the Python 3.7 programming language 3 . We chose this library because it showed good results in processing both structured and unstructured text data of various genres in Russian (Lagutina, 2022;Mamaev et al, 2023). Secondly, on the basis of the Russian National Corpus 4 and a Frequency Dictionary of Russian (Lyashevskaya & Sharov, 2009), a list of stop-words was compiled to exclude lexical units that do not contain an important semantic component: prepositions, conjunctions, auxiliary words.…”