Fisher Information Matrix of the Dirichlet-multinomial Distribution

Paul, S. R.; Balasooriya, Uditha; Banerjee, Tathagata

doi:10.1002/bimj.200410103

Cited by 15 publications

(9 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1 presents the final log-likelihood, number of iterations, and running time (in seconds) of the two MM algorithms and their SQUAREM accelerations on these data. All MM algorithms converge to the maximum point previously found by the scoring method (Paul, Balasooriya, and Banerjee 2005). For the choice ε = 10 −9 in stopping criterion (2.6), the MM algorithm (3.5) takes 700 iterations and 0.1580 sec to converge on a laptop computer.…”

Section: Applicationsmentioning

confidence: 89%

MM Algorithms for Some Discrete Multivariate Distributions

Zhou

Lange

2010

Journal of Computational and Graphical Statistics

View full text Add to dashboard Cite

The MM (minorization-maximization) principle is a versatile tool for constructing optimization algorithms. Every EM algorithm is an MM algorithm but not vice versa. This article derives MM algorithms for maximum likelihood estimation with discrete multivariate distributions such as the Dirichlet-multinomial and Connor-Mosimann distributions, the Neerchal-Morel distribution, the negative-multinomial distribution, certain distributions on partitions, and zero-truncated and zeroinflated distributions. These MM algorithms increase the likelihood at each iteration and reliably converge to the maximum from well-chosen initial values. Because they involve no matrix inversion, the algorithms are especially pertinent to high-dimensional problems. To illustrate the performance of the MM algorithms, we compare them to Newton's method on data used to classify handwritten digits.

show abstract

Section: Applicationsmentioning

confidence: 89%

MM Algorithms for Some Discrete Multivariate Distributions

Zhou

Lange

2010

Journal of Computational and Graphical Statistics

View full text Add to dashboard Cite

show abstract

“…Parameter constraint violation is still pertinent. More severely, calculation of the expected information matrix involves evaluating numerous beta-binomial tail probabilities (Paul et al, 2005). On large scale problems, this is simply infeasible.…”

Section: Discussionmentioning

confidence: 99%

“…The alternative Fisher’s scoring algorithm replaces the observed information matrix in Newton’s method by expected information matrix and yields an ascent algorithm. However, the calculation of expected information matrix for Dirichlet-Multinomial model is expensive due to numerous evaluations of beta-binomial tail probabilities (Paul et al, 2005). Recently Zhou and Lange (2010) devise the MM algorithm for a whole class of multivariate discrete distributions which include the Dirichlet-Multinomial as a special case.…”

Section: Problem Setup and A Running Examplementioning

confidence: 99%

EM vs MM: A case study

Zhou

Zhang

2012

Computational Statistics & Data Analysis

View full text Add to dashboard Cite

The celebrated expectation-maximization (EM) algorithm is one of the most widely used optimization methods in statistics. In recent years it has been realized that EM algorithm is a special case of the more general minorization-maximization (MM) principle. Both algorithms creates a surrogate function in the first (E or M) step that is maximized in the second M step. This two step process always drives the objective function uphill and is iterated until the parameters converge. The two algorithms differ in the way the surrogate function is constructed. The expectation step of the EM algorithm relies on calculating conditional expectations, while the minorization step of the MM algorithm builds on crafty use of inequalities. For many problems, EM and MM derivations yield the same algorithm. This expository note walks through the construction of both algorithms for estimating the parameters of the Dirichlet-Multinomial distribution. This particular case is of interest because EM and MM derivations lead to two different algorithms with completely distinct operating characteristics. The EM algorithm converges fast but involves solving a nontrivial maximization problem in the M step. In contrast the MM updates are extremely simple but converge slowly. An EM-MM hybrid algorithm is derived which shows faster convergence than the MM algorithm in certain parameter regimes. The local convergence rates of the three algorithms are studied theoretically from the unifying MM point of view and also compared on numerical examples.

show abstract

“…As the DM, NegMN, and GDM distributions do not belong to the exponential family, the usual iteratively reweighted least squares method for maximum likelihood estimation of GLM does not apply. The main issue lies in the difficulty of calculating the expected information matrix, which involves evaluating numerous tail probabilities of the marginal distribution (Paul et al, 2005;Zhou and Zhang, 2012). On the other hand, Newton's method suffers from instability since the log-likelihood functions are non-concave in general.…”

Section: Optimization Algorithms and Implementationmentioning

confidence: 99%

MGLM: An R Package for Multivariate Categorical Data Analysis

Kim¹,

Zhang²,

Day³

et al. 2018

The R Journal

View full text Add to dashboard Cite

Data with multiple responses is ubiquitous in modern applications. However, few tools are available for regression analysis of multivariate counts. The most popular multinomial-logit model has a very restrictive mean-variance structure, limiting its applicability to many data sets. This article introduces an R package MGLM, short for multivariate response generalized linear models, that expands the current tools for regression analysis of polytomous data. Distribution fitting, random number generation, regression, and sparse regression are treated in a unifying framework. The algorithm, usage, and implementation details are discussed. IntroductionMultivariate categorical data arises in many fields, including genomics, image analysis, text mining, and sports statistics. The multinomial-logit model (Agresti, 2002, Chapter 7) has been the most popular tool for analyzing such data. However, it is limiting due to its specific mean-variance structure and the strong assumption that the counts are negatively correlated. Models that address over-dispersion relative to a multinomial distribution and incorporate positive and/or negative correlation structures would offer greater flexibility for analysis of polytomous data.In this article, we introduce an R package MGLM, short for multivariate response generalized linear models. The MGLM package provides a unified framework for random number generation, distribution fitting, regression, hypothesis testing, and variable selection for multivariate response generalized linear models, particularly four models listed in Table 1. These models considerably broaden the class of generalized linear models (GLM) for analysis of multivariate categorical data.MGLM overlaps little with existing packages in R and other softwares. The standard multinomiallogit model is implemented in several R packages (Venables and Ripley, 2002) with VGAM (Yee, 2010(Yee, , 2015(Yee, , 2017 being the most comprehensive. We include the classical multinomial-logit regression model in MGLM not only for completeness, but also to complement it with various penalty methods for variable selection and regularization. If invoked by the group penalty, MGLM is able to perform variable selection at the predictor level for easier interpretation. This is different from the elastic net penalized multinomial-logit model implemented in glmnet (Friedman et al., 2010), which does selection at the matrix entry level. Although MGLM focuses on regression, it also provides distribution fitting and random number generation for the models listed in Table 1. VGAM and dirmult (Tvedebrink, 2010) packages can estimate the parameters of the Dirichlet-multinomial (DM) distribution using Fisher's scoring and Newton's method respectively. As indicated in the manual (Yee, 2017), the convergence of Fisher's scoring method may be slow due to the difficulty in evaluating the expected information matrix. Further the Newton's method is unstable as the log-likelihood function may be non-concave. As explained later, MGLM achieves both stability an...

show abstract

Fisher Information Matrix of the Dirichlet-multinomial Distribution

Cited by 15 publications

References 6 publications

MM Algorithms for Some Discrete Multivariate Distributions

MM Algorithms for Some Discrete Multivariate Distributions

EM vs MM: A case study

MGLM: An R Package for Multivariate Categorical Data Analysis

Contact Info

Product

Resources

About