2017
DOI: 10.15439/2017f414
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning methods for Subject Text Classification of Articles

Abstract: Abstract-This work presents a method of classification of text documents using deep neural network with LSTM (long shortterm memory) units. We have tested different approaches to build feature vectors, which represent documents to be classified: we used feature vectors constructed as sequences of words included in the documents, or, alternatively, we first converted words into vector representations using word2vec tool and used sequences of these vector representations as features of documents. We evaluated fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 41 publications
(23 citation statements)
references
References 11 publications
0
22
0
1
Order By: Relevance
“…Lately, semantic model word2vec, based on neural network technologies, has been used. A number of recent studies have demonstrated the advantage of word2vec in comparison with previously used statistical approaches (for example, when it used in tandem with LSTM networks (Semberecki and Maciejewski, 2017)), although in another recent study (Wang et al 2017), the authors failed to demonstrate experimentally significant advantage of the semantic approach (as compared to statistical one) in experiments on the classification of texts with different number of class labels. Despite this, technology word2vec is considered to be a promising area for research, being actively developed over the past few years.…”
Section: Extraction Of Features From Textual Informationmentioning
confidence: 99%
See 2 more Smart Citations
“…Lately, semantic model word2vec, based on neural network technologies, has been used. A number of recent studies have demonstrated the advantage of word2vec in comparison with previously used statistical approaches (for example, when it used in tandem with LSTM networks (Semberecki and Maciejewski, 2017)), although in another recent study (Wang et al 2017), the authors failed to demonstrate experimentally significant advantage of the semantic approach (as compared to statistical one) in experiments on the classification of texts with different number of class labels. Despite this, technology word2vec is considered to be a promising area for research, being actively developed over the past few years.…”
Section: Extraction Of Features From Textual Informationmentioning
confidence: 99%
“…Methods of text preprocessing, considered in (Goncalves and Quaresma, 2018;Semberecki and Maciejewski, 2017), showed their effectiveness in case when it is necessary to train the model for classification of the English text. Nevertheless, replacement of stemming to lemmatization when working with the Russian language can significantly improves the quality of classification, since it is much easier to conduct POS-tagging for lemmatization of Russian words than for lemmatization of English words.…”
Section: Influence Of Number Of Classes On the Quality Of Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…In terms of deep neural networks [8], [9] and [10] studied different deep neural network models on the text classification task. Semberecki and Maciejewski applied long short-term memory (LSTM) model in documents classification to study different representations approaches [8].…”
Section: Dalal and Zaverimentioning
confidence: 99%
“…In terms of deep neural networks [8], [9] and [10] studied different deep neural network models on the text classification task. Semberecki and Maciejewski applied long short-term memory (LSTM) model in documents classification to study different representations approaches [8]. Their evaluation showed that the vector representation approach outperformed a standard bag-ofword approach based on the LSTM model in the document classification task.In [9] a recurrent Convolutional Neural Network (CNN) model was proposed for text classification.…”
Section: Dalal and Zaverimentioning
confidence: 99%