Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods 2016
DOI: 10.5220/0005682000410047
|View full text |Cite
|
Sign up to set email alerts
|

Assessing the Number of Clusters in a Mixture Model with Side-information

Abstract: This paper deals with the selection of cluster number in a clustering problem taking into account the sideinformation that some points of a chunklet arise from a same cluster. An Expectation-Maximization algorithm is used to estimate the parameters of a mixture model and determine the data partition. To select the number of clusters, usual criteria are not suitable because they do not consider the side-information in the data. Thus we propose suitable criteria which are modified version of three usual criteria… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 9 publications
(10 reference statements)
0
1
0
Order By: Relevance
“…The pre-processsing step concerns a scaling operation that squeezes the data in a range between [0,1], as well as the application of Principal Component Analysis (PCA) towards the dimension reduction of our data. Regarding the optimal number of generated clusters, a grid search of the Bayesian Information Criterion (BIC) for multiple candidate clusters is proposed [12], as well as the exploration of the elbow point, that denotes the optimal option regarding the effectiveness and simplicity of the model as density estimator. The final step includes the application of Gaussian Mixture Model (GMM) through the Expectation Maximization (EM) Algorithm and the extraction of the initial clusters (VNs).…”
Section: Temporal Data Dynamic Segmentationmentioning
confidence: 99%
“…The pre-processsing step concerns a scaling operation that squeezes the data in a range between [0,1], as well as the application of Principal Component Analysis (PCA) towards the dimension reduction of our data. Regarding the optimal number of generated clusters, a grid search of the Bayesian Information Criterion (BIC) for multiple candidate clusters is proposed [12], as well as the exploration of the elbow point, that denotes the optimal option regarding the effectiveness and simplicity of the model as density estimator. The final step includes the application of Gaussian Mixture Model (GMM) through the Expectation Maximization (EM) Algorithm and the extraction of the initial clusters (VNs).…”
Section: Temporal Data Dynamic Segmentationmentioning
confidence: 99%