Suppose that univariate data are drawn from a mixture of two distributions
that are equal up to a shift parameter. Such a model is known to be
nonidentifiable from a nonparametric viewpoint. However, if we assume that the
unknown mixed distribution is symmetric, we obtain the identifiability of this
model, which is then defined by four unknown parameters: the mixing proportion,
two location parameters and the cumulative distribution function of the
symmetric mixed distribution. We propose estimators for these four parameters
when no training data is available. Our estimators are shown to be strongly
consistent under mild regularity assumptions and their convergence rates are
studied. Their finite-sample properties are illustrated by a Monte Carlo study
and our method is applied to real data.Comment: Published at http://dx.doi.org/10.1214/009053606000000353 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
We consider a two-component mixture model where one component distribution is known while the mixing proportion and the other component distribution are unknown. These kinds of models were first introduced in biology to study the differences in expression between genes. The various estimation methods proposed till now have all assumed that the unknown distribution belongs to a parametric family. In this paper, we show how this assumption can be relaxed. First, we note that generally the above model is not identifiable, but we show that under moment and symmetry conditions some 'almost everywhere' identifiability results can be obtained. Where such identifiability conditions are fulfilled we propose an estimation method for the unknown parameters which is shown to be strongly consistent under mild conditions. We discuss applications of our method to microarray data analysis and to the training data problem. We compare our method to the parametric approach using simulated data and, finally, we apply our method to real data from microarray experiments. Copyright 2006 Board of the Foundation of the Scandinavian Journal of Statistics..
To cite this version:Abstract. Recently several authors considered finite mixture models with semi-/nonparametric component distributions. Identifiability of such model parameters is generally not obvious, and when it occurs, inference methods are rather specific to the mixture model under consideration. In this paper we propose a generalization of the EM algorithm to semiparametric mixture models. Our approach is methodological and can be applied to a wide class of semiparametric mixture models. The behavior of the EM type estimators we propose is studied numerically through several Monte Carlo experiments but also by comparison with alternative methods existing in the literature. In addition to these numerical experiments we provide applications to real data showing that our estimation methods behaves well, that it is fast and easy to be implemented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.