The multinomial distribution has been widely used to model count data. To increase clustering efficiency, we use an approximation to the Fisher scoring algorithm, which is more robust regarding the choice of initial parameter values. Then, we use a novel approach to estimate the optimal number of components, based on minimum message length criterion. Moreover, we consider a generalization of the multinomial model obtained by introducing the Dirichlet as prior, yielding the Dirichlet Compound Multinomial (DCM). Even though DCM can address the burstiness phenomenon of count data, the presence of Gamma function in its density function usually leads to undesired complications. In this article, we use two alternative representations of DCM distribution to perform clustering based on finite mixture models, where the mixture parameters are estimated using the minorization-maximization framework. To evaluate and compare the performance of our proposed models, we have considered three challenging real-world applications that involve high-dimensional count vectors, namely, sentiment analysis, facial expression recognition, and human action recognition. The results show that the proposed algorithms increase the clustering efficiency of their respective models remarkably, and the best results are achieved by the second parametrization of DCM, which can accommodate over-dispersed count data.