2010
DOI: 10.4304/jait.1.1.4-20
|View full text |Cite
|
Sign up to set email alerts
|

A Review of Machine Learning Algorithms for Text-Documents Classification

Abstract:

With the increasing availability of electronic documents and the rapid growth of the World Wide Web, the task of automatic categorization of documents became the key method for organizing the information and knowledge discovery. Proper classification of e-documents, online news, blogs, e-mails and digital libraries need text mining, machine learning and natural language proces… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
142
1
3

Year Published

2014
2014
2021
2021

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 283 publications
(147 citation statements)
references
References 96 publications
1
142
1
3
Order By: Relevance
“…Some investigations [18,19] have shown effectiveness of the k -NN algorithm for text classification. We have varied k from 1 to 15.…”
Section: Feature Selection and Comparison Of Term Weighting Methodsmentioning
confidence: 99%
“…Some investigations [18,19] have shown effectiveness of the k -NN algorithm for text classification. We have varied k from 1 to 15.…”
Section: Feature Selection and Comparison Of Term Weighting Methodsmentioning
confidence: 99%
“…Machine Learning techniques, Data Mining and Natural Language Processing (NLP) work in combination to automatically identify patterns from the electronic documents to help classify them in intended categories (ALmomani et al, 2012). Naïve bayes classifier was found to be most effective in real world complex scenarios due to simple initial conditions required by the model (Baharudin et al, 2010). Naïve Bayes classifiers can be trained in an efficient manner.…”
Section: Related Workmentioning
confidence: 99%
“…The formation of basic words with the steaming process in the document in Indonesia still has constraints where not all words can be truncated properly. This study uses the Zamief Nasri algorithm [16] which has been picketed in a sastrawi PHP library. Table 5 shows the list of Indonesian affixes.…”
Section: Stemmingmentioning
confidence: 99%