2021
DOI: 10.48550/arxiv.2104.12485
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Powered Dirichlet Process for Controlling the Importance of "Rich-Get-Richer" Prior Assumptions in Bayesian Clustering

Abstract: One of the most used priors in Bayesian clustering is the Dirichlet prior. It can be expressed as a Chinese Restaurant Process. This process allows nonparametric estimation of the number of clusters when partitioning datasets. Its key feature is the "rich-get-richer" property, which assumes a cluster has an a priori probability to get chosen linearly dependent on population. In this paper, we show that such prior is not always the best choice to model data. We derive the Powered Chinese Restaurant process from… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…In [21], the authors derive the Uniform process (UP) and show that it performs better on a document clustering task. In [16], the authors generalize UP and DP within a more general framework, the Powered Dirichlet process (PDP), and show it performs better than DP on several datasets.…”
Section: Proposed Approach 41 Improvements Over Dhpmentioning
confidence: 99%
See 1 more Smart Citation
“…In [21], the authors derive the Uniform process (UP) and show that it performs better on a document clustering task. In [16], the authors generalize UP and DP within a more general framework, the Powered Dirichlet process (PDP), and show it performs better than DP on several datasets.…”
Section: Proposed Approach 41 Improvements Over Dhpmentioning
confidence: 99%
“…In particular, making the prior more or less dependent on the temporal dimension (in the same way that [17,21] makes the DP more or less dependent on the "rich-get-richer" hypothesis) could lead to clusters that are more text or time orientated. By replacing the standard Dirichlet process in [5] by the Powered Dirichlet process from [16], we derive the Powered Dirichlet-Hawkes process [17]: The algorithm used for inference is a sequential Monte-Carlo; it is the same as described in [5,11,17]. It is especially fit for modeling data streams, as it considers data sequentially.…”
Section: The Modelmentioning
confidence: 99%