Mixtures of $r$ independent distributions for two discrete random variables
can be represented by matrices of nonnegative rank $r$. Likelihood inference
for the model of such joint distributions leads to problems in real algebraic
geometry that are addressed here for the first time. We characterize the set of
fixed points of the Expectation-Maximization algorithm, and we study the
boundary of the space of matrices with nonnegative rank at most $3$. Both of
these sets correspond to algebraic varieties with many irreducible components.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1282 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The Expectation-Maximization (EM) algorithm is routinely used for maximum likelihood estimation in latent class analysis. However, the EM algorithm comes with no global guarantees of reaching the global optimum. We study the geometry of the latent class model in order to understand the behavior of the maximum likelihood estimator. In particular, we characterize the boundary stratification of the binary latent class model with a binary hidden variable. For small models, such as for three binary observed variables, we show that this stratification allows exact computation of the maximum likelihood estimator. In this case we use simulations to study the maximum likelihood estimation attraction basins of the various strata and performance of the EM algorithm. Our theoretical study is complemented with a careful analysis of the EM fixed point ideal which provides an alternative method of studying the boundary stratification and maximizing the likelihood function. In particular, we compute the minimal primes of this ideal in the case of a binary latent class model with a binary or ternary hidden random variable.
Abstract. In [2] Buczyńska and Wiśniewski showed that the Hilbert polynomial of the algebraic variety associated to the Jukes-Cantor binary model on a trivalent tree depends only on the number of leaves of the tree and not on its shape. We ask if this can be generalized to other group-based models. The Jukes-Cantor binary model has Z 2 as the underlying group. We consider the Kimura 3-parameter model with Z 2 × Z 2 as the underlying group. We show that the generalization of the statement about the Hilbert polynomials to the Kimura 3-parameter model is not possible as the Hilbert polynomial depends on the shape of a trivalent tree.
Mathematics Subject Classifications: 13P99, 52B20
Phylogenetic models admit polynomial parametrization maps in terms of the root distribution and transition probabilities along the edges of the phylogenetic tree. For symmetric continuous-time group-based models, Matsen studied the polynomial inequalities that characterize the joint probabilities in the image of these parametrizations (Matsen in IEEE/ACM Trans Comput Biol Bioinform 6:89–95, 2009). We employ this description for maximum likelihood estimation via numerical algebraic geometry. In particular, we explore an example where the maximum likelihood estimate does not exist, which would be difficult to discover without using algebraic methods.
Abstract. The geometry of the set of restrictions of rank-one tensors to some of their coordinates is studied.This gives insight into the problem of rank-one completion of partial tensors. Particular emphasis is put on the semialgebraic nature of the problem, which arises for real tensors with constraints on the parameters. The algebraic boundary of the completable region is described for tensors parametrized by probability distributions and where the number of observed entries equals the number of parameters. If the observations are on the diagonal of a tensor of format d × · · · × d, the complete semialgebraic description of the completable region is found.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.