2016
DOI: 10.1007/978-3-319-39378-0_53
|View full text |Cite
|
Sign up to set email alerts
|

Distributed Classification of Text Documents on Apache Spark Platform

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 10 publications
0
12
0
Order By: Relevance
“…This work is an extension of the previous research [2], where subject classification was done using standard Machine Learning such as Decision Trees, Naive Bayes classifier etc., with the focus on distributed implementation, in order to manage large volumes of data. The best results in the previous work was obtained using Bag-of-Words model with TF-IDF and Naive Bayes, where recognition of three categories: History, Arts and Law was done with ca 75.28% accuracy on the testing corpus.…”
Section: Results Of Bow Methods For This Dataset In Previous Workmentioning
confidence: 99%
See 3 more Smart Citations
“…This work is an extension of the previous research [2], where subject classification was done using standard Machine Learning such as Decision Trees, Naive Bayes classifier etc., with the focus on distributed implementation, in order to manage large volumes of data. The best results in the previous work was obtained using Bag-of-Words model with TF-IDF and Naive Bayes, where recognition of three categories: History, Arts and Law was done with ca 75.28% accuracy on the testing corpus.…”
Section: Results Of Bow Methods For This Dataset In Previous Workmentioning
confidence: 99%
“…This approach was similar to previous work [2] and is a part of traditional NLP processing chain. We used English Punkt as sentence tokenizer for segmentation task.…”
Section: B Sample Data For Empirical Verification Of the Methodsmentioning
confidence: 90%
See 2 more Smart Citations
“…It also aims to store in memory all the assigned training data partitions on particular nodes. Another implementation based on Apache Spark uses similar techniques such as Hadoop MapReduce implementations [10], while several other works leverage the computational power of GPUs (Graphics Processing Units) to improve the performance of the MapReduce implementations. Caragea et al [11] describe a multi-agent approach to building tree-based classifiers.…”
mentioning
confidence: 99%