Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 1993
DOI: 10.1145/160688.160706
|View full text |Cite
|
Sign up to set email alerts
|

Constant interaction-time scatter/gather browsing of very large document collections

Abstract: The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contentslike outlines of large document collections. Previous work [I] developed linear-time document clustering algorithms to establish the feasibility of this method over moderately large collections. However, even linear-time algorithms are too slow to support, interactive browsing of very large collections such as Tipster. the DARPA st,andard text retrieval evaluation collection. We present a scheme that supports … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
72
0
2

Year Published

1999
1999
2012
2012

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 145 publications
(74 citation statements)
references
References 3 publications
0
72
0
2
Order By: Relevance
“…Many document clustering algorithms rely on off-line clustering of the entire document collection [6,28]. Since the Web is a dynamic environment, static, pre-computed clusters would have to be constantly updated, and most clustering algorithms cannot do so incrementally.…”
Section: Related Workmentioning
confidence: 99%
“…Many document clustering algorithms rely on off-line clustering of the entire document collection [6,28]. Since the Web is a dynamic environment, static, pre-computed clusters would have to be constantly updated, and most clustering algorithms cannot do so incrementally.…”
Section: Related Workmentioning
confidence: 99%
“…Clustering a set of points into a few groups is frequently used for statistical analysis and classi cation in numerous applications, including information retrieval 6,7,8,26], facility location 11,30], data mining 2,28], spatial data bases 9,20,25], data compression 23], image processing 19, 27], astrophysics 22], and scienti c computing 5]. Because of such a diversity of applications, several variants of clustering problems have been proposed and widely studied.…”
Section: Introductionmentioning
confidence: 99%
“…One possibility is to perform a hierarchical clustering a-priori; however such an approach has the disadvantage that it is unable to merge and recluster related branches of the tree hierarchy on-the-fly when a user may need it. A method for constant-interaction time browsing with the use of the scatter-gather approach has been presented in [26]. This approach presents the keywords associated with the different keywords to a user.…”
Section: Fractionationmentioning
confidence: 99%
“…Clearly, a larger value of M does not assume the cluster-refinement hypothesis quite as strongly, but also comes at a higher cost. The details of the algorithm are described in [26]. Some extensions of this approach are also presented in [85], in which it has been shown how this approach can be used to cluster arbitrary corpus subsets of the documents in constant time.…”
Section: Fractionationmentioning
confidence: 99%
See 1 more Smart Citation