Proceeding of the 6th Conference on Natural Language Learning - COLING-02 2002
DOI: 10.3115/1118853.1118862
|View full text |Cite
|
Sign up to set email alerts
|

Cross-dataset clustering

Abstract: We present a method for identifying corresponding themes across several corpora that are focused on related, but distinct, domains. This task is approached through simultaneous clustering of keyword sets extracted from the analyzed corpora. Our algorithm extends the informationbottleneck soft clustering method for a suitable setting consisting of several datasets. Experimentation with topical corpora reveals similar aspects of three distinct religions. The evaluation is by way of comparison to clusters constru… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2005
2005
2021
2021

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(1 citation statement)
references
References 7 publications
0
1
0
Order By: Relevance
“…The communication complexity of estimating one-dimensional Gaussian correlations is established in [19] and that of independence testing over discrete alphabet in the large sample regime is characterized in [9]. The tradeoff between communication complexity and sample complexity for detecting pairwise correlations is studied in [15]. A related line of recent work considers composite hypothesis testing under communication, privacy, and shared randomness constraints [1]- [4].…”
Section: Introductionmentioning
confidence: 99%
“…The communication complexity of estimating one-dimensional Gaussian correlations is established in [19] and that of independence testing over discrete alphabet in the large sample regime is characterized in [9]. The tradeoff between communication complexity and sample complexity for detecting pairwise correlations is studied in [15]. A related line of recent work considers composite hypothesis testing under communication, privacy, and shared randomness constraints [1]- [4].…”
Section: Introductionmentioning
confidence: 99%