AMS 2000 subject classifications: 60G57 62G05 62F15
Keywords:Bayesian nonparametric inference Asymptotic credible intervals Exchangeable random partition Gibbs-type random probability measure Index of diversity Sampling formula Species sampling problem Rare variant Two parameter Poisson-Dirichlet process a b s t r a c t Species sampling problems have a long history in ecological and biological studies and a number of statistical issues, including the evaluation of species richness, are still to be addressed. In this paper, motivated by Bayesian nonparametric inference for species sampling problems, we consider the practically important and technically challenging issue of developing a comprehensive posterior analysis of the so-called rare variants, namely those species with frequency less than or equal to a given abundance threshold. In particular, by adopting a Gibbs-type prior, we provide an explicit expression for the posterior joint distribution of the frequency counts of the rare variants, and we investigate some of its statistical properties. The proposed results are illustrated by means of two novel applications to a benchmark genomic dataset.