Deep Learning methods for Subject Text Classification of Articles

Semberecki, Piotr; Maciejewski, Henryk

doi:10.15439/2017f414

Cited by 41 publications

(23 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Lately, semantic model word2vec, based on neural network technologies, has been used. A number of recent studies have demonstrated the advantage of word2vec in comparison with previously used statistical approaches (for example, when it used in tandem with LSTM networks (Semberecki and Maciejewski, 2017)), although in another recent study (Wang et al 2017), the authors failed to demonstrate experimentally significant advantage of the semantic approach (as compared to statistical one) in experiments on the classification of texts with different number of class labels. Despite this, technology word2vec is considered to be a promising area for research, being actively developed over the past few years.…”

Section: Extraction Of Features From Textual Informationmentioning

confidence: 99%

“…Methods of text preprocessing, considered in (Goncalves and Quaresma, 2018;Semberecki and Maciejewski, 2017), showed their effectiveness in case when it is necessary to train the model for classification of the English text. Nevertheless, replacement of stemming to lemmatization when working with the Russian language can significantly improves the quality of classification, since it is much easier to conduct POS-tagging for lemmatization of Russian words than for lemmatization of English words.…”

Section: Influence Of Number Of Classes On the Quality Of Classificationmentioning

confidence: 99%

“…Based on works (Abuhaiba and Dawoud, 2017;Bourgonje et al 2018;Liu et al 2017;Semberecki and Maciejewski, 2017), it is possible to identify the models of machine learning that are most suitable for classification of textual data. Such models are: logistic regression, random forest, SVM, and artificial neural network (both feedforward and LSTM).…”

Section: Influence Of Number Of Classes On the Quality Of Classificationmentioning

confidence: 99%

See 2 more Smart Citations

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Romanov

Lomotin

Kozlova

2019

Data Science Journal

View full text Add to dashboard Cite

This work is devoted to the study of applicability of modern methods of machine learning to the task of automatic classification of scientific articles and abstracts. For this purpose, the study of such models of machine learning as artificial neural networks, random forest, logistic regression, and support vector machine was carried out with taking into account such a feature of scientific texts as a large number of terms specific for various categories. Separately, the stages of data collection and extraction of text characteristics are considered. The results of research are used in development of a decision support system for assignment of scientific texts to the code of the department or abstract journal of All-Russian Institute of Scientific and Technical Information of Russian Academy of Sciences.

show abstract

Section: Extraction Of Features From Textual Informationmentioning

confidence: 99%

Section: Influence Of Number Of Classes On the Quality Of Classificationmentioning

confidence: 99%

Section: Influence Of Number Of Classes On the Quality Of Classificationmentioning

confidence: 99%

See 1 more Smart Citation

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Romanov

Lomotin

Kozlova

2019

Data Science Journal

View full text Add to dashboard Cite

show abstract

Section: Dalal and Zaverimentioning

confidence: 99%

“…In terms of deep neural networks [8], [9] and [10] studied different deep neural network models on the text classification task. Semberecki and Maciejewski applied long short-term memory (LSTM) model in documents classification to study different representations approaches [8]. Their evaluation showed that the vector representation approach outperformed a standard bag-ofword approach based on the LSTM model in the document classification task.In [9] a recurrent Convolutional Neural Network (CNN) model was proposed for text classification.…”

Section: Dalal and Zaverimentioning

confidence: 99%

A Study into Math Document Classification using Deep Learning

Alshamari¹,

Youssef²

2020

Computer Science &Amp; Information Technology (CS &Amp; IT)

View full text Add to dashboard Cite

Document classification is a fundamental task for many applications, including document annotation, document understanding, and knowledge discovery. This is especially true in STEM fields where the growth rate of scientific publications is exponential, and where the need for document processing and understanding is essential to technological advancement. Classifying a new publication into a specific domain based on the content of the document is an expensive process in terms of cost and time. Therefore, there is a high demand for a reliable document classification system. In this paper, we focus on classification of mathematics documents, which consist of English text and mathematics formulas and symbols. The paper addresses two key questions. The first question is whether math-document classification performance is impacted by math expressions and symbols, either alone or in conjunction with the text contents of documents. Our investigations show that Text-Only embedding produces better classification results. The second question we address is the optimization of a deep learning (DL) model, the LSTM combined with one dimension CNN, for math document classification. We examine the model with several input representations, key design parameters and decision choices, and choices of the best input representation for math documents classification.

show abstract

Machine Learning and Natural Language Processing in Domain Classification of Scientific Knowledge Objects: A Review

Machado

Sá

2021

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

The domain classification of scientific knowledge objects has been continuously improved over the years. Systems that can automatically classify a scientific knowledge object, through the use of artificial intelligence, machine learning algorithms, natural language processing, and others, have been adopted in most scientific knowledge databases to maintain internal classification consistency as well as to simplify the information arrangement. However, the amount of available data has grown exponentially in the last few years and now it can be found in multiple platforms under different classifications due to the implementation of different classification systems. Thus, the process of searching and selecting relevant data in research studies and projects has become more complex and the time needed to find the right information has continuously grown as well. Therefore, machine learning and natural language processing play an important role in the development and achievement of automatic and standardized classification systems that will aid researchers in their research work.

show abstract

Deep Learning methods for Subject Text Classification of Articles

Cited by 41 publications

References 11 publications

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

A Study into Math Document Classification using Deep Learning

Machine Learning and Natural Language Processing in Domain Classification of Scientific Knowledge Objects: A Review

Contact Info

Product

Resources

About