“…Initialize the mixing coefficient, α m , for each component, m, in the grid to 1/M ; Set the mean and the variance of the shared distribution, q(·|λ), as the mean and covariance of the training set; repeat Compute R, U and V using (6), (7) and (8) respectively, using current parameters, Θ; (9); end Obtain the center, µ m , of each component, m, of the mixture in the data space, using (11); Reestimate the width of the diagonal Gaussians, σ d , using (12), for all the features; Reestimate the mean and the variance of the shared distribution using (13) and (14) respectively; Reestimate the feature weight, ρ d , using (15), for all the features; until convergence; end The parameters are estimated using a variant of the EM algorithm as follows.…”