2022 31st Conference of Open Innovations Association (FRUCT) 2022
DOI: 10.23919/fruct54823.2022.9770920
|View full text |Cite
|
Sign up to set email alerts
|

Topical Text Classification of Russian News: a Comparison of BERT and Standard Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 13 publications
0
1
0
Order By: Relevance
“…First, the texts of the corpus were tokenized and lemmatized using the Stanza library (Qi et al, 2020) for the Python 3.7 programming language 3 . We chose this library because it showed good results in processing both structured and unstructured text data of various genres in Russian (Lagutina, 2022;Mamaev et al, 2023). Secondly, on the basis of the Russian National Corpus 4 and a Frequency Dictionary of Russian (Lyashevskaya & Sharov, 2009), a list of stop-words was compiled to exclude lexical units that do not contain an important semantic component: prepositions, conjunctions, auxiliary words.…”
Section: Processing Toolsmentioning
confidence: 99%
“…First, the texts of the corpus were tokenized and lemmatized using the Stanza library (Qi et al, 2020) for the Python 3.7 programming language 3 . We chose this library because it showed good results in processing both structured and unstructured text data of various genres in Russian (Lagutina, 2022;Mamaev et al, 2023). Secondly, on the basis of the Russian National Corpus 4 and a Frequency Dictionary of Russian (Lyashevskaya & Sharov, 2009), a list of stop-words was compiled to exclude lexical units that do not contain an important semantic component: prepositions, conjunctions, auxiliary words.…”
Section: Processing Toolsmentioning
confidence: 99%