Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2001
DOI: 10.1145/383952.383976
|View full text |Cite
|
Sign up to set email alerts
|

On feature distributional clustering for text categorization

Abstract: We describe a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier. Our feature selection approach employs distributional clustering of words via the recently introduced information bottleneck method, which generates a more efficient word-cluster representation of documents. Combined with the classification power of an SVM, this method yields high performance text categorization that can outperform other recent methods in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
184
0
9

Year Published

2005
2005
2017
2017

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 195 publications
(197 citation statements)
references
References 12 publications
4
184
0
9
Order By: Relevance
“…Corpus Summarization: Clustering techniques provide a coherent summary of the collection in the form of cluster-digests [83] or word-clusters [17,18], which can be used in order to provide summary insights into the overall content of the underlying corpus. Variants of such methods, especially sentence clustering, can also be used for document summarization, a topic, discussed in detail in Chapter 3.…”
Section: Document Organization and Browsingmentioning
confidence: 99%
See 2 more Smart Citations
“…Corpus Summarization: Clustering techniques provide a coherent summary of the collection in the form of cluster-digests [83] or word-clusters [17,18], which can be used in order to provide summary insights into the overall content of the underlying corpus. Variants of such methods, especially sentence clustering, can also be used for document summarization, a topic, discussed in detail in Chapter 3.…”
Section: Document Organization and Browsingmentioning
confidence: 99%
“…In particular, word-clusters [17,18] and co-training methods [72] can be used in order to improve the classification accuracy of supervised applications with the use of clustering techniques.…”
Section: Document Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…Most usual cases where this technique is applied are gene selection from microarray data [13] [14] and text categorization [15] [16]. Confront the curse of dimensionality carries some recognized advantages like: reducing the measurement and storage requirements, facilitating visualization and understanding of data, diminishing training and predicting times and also improving prediction performance.…”
Section: Feature Selectionmentioning
confidence: 99%
“…these methods rank variables according to their individual predictive power. Pearson correlation [20], the coefficient of determination [13], information theoretic criteria [15] and partial least squares (PLS) [21] are used and correspond to this class.…”
Section: Feature Selectionmentioning
confidence: 99%