2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) 2016
DOI: 10.1109/iceeot.2016.7754750
|View full text |Cite
|
Sign up to set email alerts
|

Document clustering: TF-IDF approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
56
0
3

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 179 publications
(67 citation statements)
references
References 13 publications
0
56
0
3
Order By: Relevance
“…Secondly, the data also contained many stop words which is never meaningful or useful in this context as explained in Section 2.2. Hence in order to filter those stop words, a list of 500 stop words was used first which filtered the data and removed all stop words from it [1], [4], [5]. A large list of stop words can easily be obtained from many blogs and websites where it is available for free for general public to consume.…”
Section: Data Preprocessingmentioning
confidence: 99%
See 2 more Smart Citations
“…Secondly, the data also contained many stop words which is never meaningful or useful in this context as explained in Section 2.2. Hence in order to filter those stop words, a list of 500 stop words was used first which filtered the data and removed all stop words from it [1], [4], [5]. A large list of stop words can easily be obtained from many blogs and websites where it is available for free for general public to consume.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…Third, one need to count total number of words and their occurrences in all documents. Once these steps are performed, one can apply Term Frequency formula to calculate TF as discussed in Section 2.1 [1], [4], [6].…”
Section: Designmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, silhouette coefficient, which was proposed by Rousseeuw [28], has been widely used to evaluate clustering results [29,30]. In this study, we employed the mean silhouette value to evaluate the clustering results, which depended on the similarities between one document and both of the other documents in the same cluster and that in the most similar cluster.…”
Section: Experimental Design and Evaluation Indexmentioning
confidence: 99%
“…Their experimentation results shows that the optimal weights of features computed by the algorithm improvises the retrieval results significantly. One of the numerical statistic "tf-idf" helps in placing the weightage of a particular word's importance in a document for text retrieval [18]. We extend the similar metric for image retreival as well by giving weightage to a particular feature based on the feedback from the end user and adjust this accordingly on every iteration.Yu Suzuki, Masahiro Mitsukawa and Kyoji Kawagoe [19] have used tf-idf approach to find the importance degree of features and by using their method, the CBIR system can find results matching the user query to the closest possible.…”
Section: Related Workmentioning
confidence: 99%