2003
DOI: 10.1016/s0306-4573(02)00022-5
|View full text |Cite
|
Sign up to set email alerts
|

Text categorization based on k-nearest neighbor approach for Web site classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
54
1

Year Published

2008
2008
2017
2017

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 118 publications
(55 citation statements)
references
References 17 publications
0
54
1
Order By: Relevance
“…As regards our own work, we achieved an overall accuracy of 83.5% using 401 documents (of varied lengths) with 18 categories by applying the kNN in a novel way. This percentage is still higher than in comparable works [10,34,35].…”
Section: Experiments Results and Evaluationcontrasting
confidence: 66%
See 1 more Smart Citation
“…As regards our own work, we achieved an overall accuracy of 83.5% using 401 documents (of varied lengths) with 18 categories by applying the kNN in a novel way. This percentage is still higher than in comparable works [10,34,35].…”
Section: Experiments Results and Evaluationcontrasting
confidence: 66%
“…The kNN classifier is a relatively simple algorithm compared to more complex approaches like artificial neural networks or support vector machines [9]. This simplicity, robustness, flexibility, and reasonably high accuracies have been exploited in diverse fields such as patent research [10], medical research [11], astrophysics [12], bioinformatics [13], and text categorisation [14,15]. The drawback of kNN lies in the expensive testing of each instance as every new instance must be compared with the whole dataset.…”
Section: Introductionmentioning
confidence: 99%
“…Identifying the topic of knowledge is important in that the topic (or the keyword) indicates the subject of knowledge embedded in the document. To extract the topic of knowledge based on predefined knowledge categories, text mining techniques can be employed [12].…”
Section: Support Vector Machines As the Classifiermentioning
confidence: 99%
“…The most well-known unsupervised term weighting method is TFIDF [15]. The following supervised term weighting methods are also considered in the paper: Gain Ratio (GR) [3], Confident Weights (CW) [10], Term Second Moment (TM2) [22], Relevance Frequency (RF) [11], Term Relevance Ratio (TRR) [9], and Novel Term Weighting (NTW) [18]; these methods involve information about the classes of the documents. As a rule, the dimensionality for text classification problems is high even after stop-words filtering and stemming.…”
Section: Introductionmentioning
confidence: 99%
“…Some comparative studies of machine learning algorithms in the field of text classification showed high classification effectiveness of k-NN, SVM-based algorithms, and ANN [2,7,8,10,13].…”
Section: Introductionmentioning
confidence: 99%