Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval 2010
DOI: 10.1145/1835449.1835530
|View full text |Cite
|
Sign up to set email alerts
|

Towards subjectifying text clustering

Abstract: Although it is common practice to produce only a single clustering of a dataset, in many cases text documents can be clustered along different dimensions. Unfortunately, not only do traditional text clustering algorithms fail to produce multiple clusterings of a dataset, the only clustering they produce may not be the one that the user desires. In this paper, we propose a simple active clustering algorithm that is capable of producing multiple clusterings of the same data according to user interest. In compari… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…Filter approaches [3] that do not assume specific form to the classifier but use a specific criterion to judge the relevance of individual features or feature subsets. The simplest approach is to rank features by a selection criterion (for instance, mutual information, which has been shown to perform well for document classification [28], weighted likelihood ratio [53], or other heuristics [20]) and select the top-ranked subset. Joint filter approaches that consider the dependency and (possible redundancy) among features [54].…”
Section: Feature Selectionmentioning
confidence: 99%
See 2 more Smart Citations
“…Filter approaches [3] that do not assume specific form to the classifier but use a specific criterion to judge the relevance of individual features or feature subsets. The simplest approach is to rank features by a selection criterion (for instance, mutual information, which has been shown to perform well for document classification [28], weighted likelihood ratio [53], or other heuristics [20]) and select the top-ranked subset. Joint filter approaches that consider the dependency and (possible redundancy) among features [54].…”
Section: Feature Selectionmentioning
confidence: 99%
“…The reviews cover 5 topics 7 : movies, books, dvds, electronics, and kitchen; for each topic there are 2000 reviews with 1000 positive sentiment and 1000 negative sentiment reviews. Like previous usage of this datatset [53], we explore both topic and sentiment classification within a given topic to compare the feature selection algorithms, but for clustering we ignore the sentiment and only consider mixtures of different topics.…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…There has also been work in alternative clustering where the system constructs multiple clusterings and allows the SME to select between them let the user select between them [8]. Multiple clusterings can be constructed in many ways, for example by re-weighting features or changing the objective functions; however, such approaches by design require well-defined features, which may not always be attainable.…”
Section: Related Workmentioning
confidence: 99%
“…A text document clustering algorithm is proposed in [8], which is capable of producing multiple clusterings of the same data based on different point of views. Following a spectral clustering algorithm [21], a Laplacian matrix is generated using the cosine similarity among documents.…”
Section: Related Workmentioning
confidence: 99%