Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Romanov, A. Yu.; Lomotin, Konstantin; Kozlova, E.

doi:10.5334/dsj-2019-037

Cited by 23 publications

(9 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comparing the performance of our results with that of similar research on automated classification of scientific literature is not straightforward but some observations can be made. For example, in [ 21 ] we see F-scores of around 0.50 which is in the same area as our experiment 2, which had the largest number of classes. This study had a much larger training set but it is difficult to compare the complexity of the tasks.…”

Section: Discussionsupporting

confidence: 73%

“…classification of mathematical research [ 19 ] and on general research literature with the purpose of applying the correct Dewey Decimal Classification code [ 20 ]. While much work focuses on classification of English-language literature, examples of using machine learning methods for automated coding of scientific literature in the Russian language [ 21 ]. Most approaches appear to be based on supervised learning but use of unsupervised learning also exists [ 20 ].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using neural networks to support high-quality evidence mapping

et al. 2021

View full text Add to dashboard Cite

Background The Living Evidence Map Project at the Norwegian Institute of Public Health (NIPH) gives an updated overview of research results and publications. As part of NIPH’s mandate to inform evidence-based infection prevention, control and treatment, a large group of experts are continously monitoring, assessing, coding and summarising new COVID-19 publications. Screening tools, coding practice and workflow are incrementally improved, but remain largely manual. Results This paper describes how deep learning methods have been employed to learn classification and coding from the steadily growing NIPH COVID-19 dashboard data, so as to aid manual classification, screening and preprocessing of the rapidly growing influx of new papers on the subject. Our main objective is to make manual screening scalable through semi-automation, while ensuring high-quality Evidence Map content. Conclusions We report early results on classifying publication topic and type from titles and abstracts, showing that even simple neural network architectures and text representations can yield acceptable performance.

show abstract

Section: Discussionsupporting

confidence: 73%

Section: Introductionmentioning

confidence: 99%

Using neural networks to support high-quality evidence mapping

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Among the ML algorithms, there are the Support Vector Machine (SVM) and Naïve Bayes (NB) algorithms which, in addition to being the most traditional algorithms, continue to provide good results. In Romanov et al [4] 99% accuracy was obtained regarding the classification of scientific texts based on their abstracts. However, this high acuity value reveals low precision and recall values, 61% and 36% respectively, which is not ideal.…”

Section: Data Classification Results -Mlmentioning

confidence: 99%

“…The most common approaches to the use of NLP techniques usually use a set of steps, in which the data obtained is processed. In the work of Romanov et al [4], in which a classification system for scientific texts in Russian was developed, an approach consisting of 5 steps was presented, namely: the removal of formulas that are frequent in scientific texts; the aggregation of metadata, which includes the title, keywords, and summary; transformation of data to lowercase; the removal of stop words that reduces the amount of existing information to just useful information; and the stemming of words, which consists of deflecting words to determine their lemma.…”

Section: Data Processing -Nlpmentioning

confidence: 99%

Machine Learning and Natural Language Processing in Domain Classification of Scientific Knowledge Objects: A Review

Machado

Sá

2021

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

The domain classification of scientific knowledge objects has been continuously improved over the years. Systems that can automatically classify a scientific knowledge object, through the use of artificial intelligence, machine learning algorithms, natural language processing, and others, have been adopted in most scientific knowledge databases to maintain internal classification consistency as well as to simplify the information arrangement. However, the amount of available data has grown exponentially in the last few years and now it can be found in multiple platforms under different classifications due to the implementation of different classification systems. Thus, the process of searching and selecting relevant data in research studies and projects has become more complex and the time needed to find the right information has continuously grown as well. Therefore, machine learning and natural language processing play an important role in the development and achievement of automatic and standardized classification systems that will aid researchers in their research work.

show abstract

“…Real-world raw data is usually unsuitable for direct use in classifier training, so some cleaning and preprocessing steps are generally applied before the classification task. Thus, scientific contents must go through a Natural Language Processing (NLP) techniques for the data to be ready for classification [2].…”

Section: Introductionmentioning

confidence: 99%

Automatic Classifier of Scientific Contents

Machado

Sá

2021

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

The growth of scientific production, associated with the increase in the complexity of scientific contents, makes the classification of these contents highly subjective and subject to misinterpretation. The taxonomy on which this classification process is based does not follow the scientific areas' changes. These classification processes are manually carried out and are therefore subject to misclassification. A classification process that allows automation and implements intelligent algorithms based on Machine Learning algorithms presents a possible solution to subjectivity in classification. Although it does not solve the inadequacy of taxonomy, this work shows this possibility by developing a solution to this problem. In conclusion, this work proposes a solution to classify scientific content based on the title, abstract, and keywords through Natural Language Processing techniques and Machine Learning algorithms to organize scientific content in scientific domains.

show abstract

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Cited by 23 publications

References 20 publications

Using neural networks to support high-quality evidence mapping

Using neural networks to support high-quality evidence mapping

Machine Learning and Natural Language Processing in Domain Classification of Scientific Knowledge Objects: A Review

Automatic Classifier of Scientific Contents

Contact Info

Product

Resources

About