$\gamma$-SUP: A clustering algorithm for cryo-electron microscopy images of asymmetric particles

Chen, Ting-Li; Hsieh, Dai-Ni; Hung, Hung; Tu, I-Ping; Wu, Pei-Shien; Wu, Yiming; Chang, Wen Ho; Huang, Su‐Yun

doi:10.1214/13-aoas680

Cited by 19 publications

(13 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With a proper choice of the influence function f t , the self-updating process can be taken as a clustering method that minimizes a criterion function. For example, the γ-SUP that uses the q-Gaussian as the weight function is a clustering method that minimizes the γ-divergence to the empirical data [22]. When there is information on data structure, we can incorporate the information in the function f t ; then the self-updating process can be taken as a model-based clustering method.…”

Section: Discussionmentioning

confidence: 99%

On the strengths of the self-updating process clustering algorithm

Shiu

Chen

2015

Journal of Statistical Computation and Simulation

Self Cite

View full text Add to dashboard Cite

The Self-Updating Process (SUP) is a clustering algorithm that stands from the viewpoint of data points and simulates the process how data points move and perform self-clustering. It is an iterative process on the sample space and allows for both time-varying and time-invariant operators. By simulations and comparisons, this paper shows that SUP is particularly competitive in clustering (i) data with noise, (ii) data with a large number of clusters, and (iii) unbalanced data. When noise is present in the data, SUP is able to isolate the noise data points while performing clustering simultaneously. The property of the local updating enables SUP to handle data with a large number of clusters and data of various structures. In this paper, we showed that the blurring mean-shift is a static SUP. Therefore our discussions on the strengths of SUP also apply to the blurring mean-shift. Journal of Statistical Computation and Simulation sup˙arxiv has been implemented in the Generalized Association Plots [13,14]. Compared with the iteratively generated correlation matrices, the self-updating process operates on the sample space, not on the correlation space. It shows the actual movements of data points around the sample space. Data points continue updating their positions until the whole system reaches a balanced therefore static condition, in which the clusters are formed. It is as if the process describes how data points perform self-clustering. We therefore named it Self-Updating Process (SUP).A similar iterative process that also operates on the sample space is the mean-shift algorithm [15]. It has non-blurring and blurring approaches. Compared with the selfupdating process in which operators can be time-varying, both non-blurring and blurring approaches use time-invariant operators. Specific differences between the mean-shift and and the self-updating process are outlined in Section 2.6. The mean-shift algorithm made its first appearance for kernel density estimation by taking the sample mean within a local region to estimate the gradient of a density function. It was further extended and analyzed by Cheng [16]. Comaniciu successfully applied the non-blurring mean-shift algorithm to the problem of image segmentation [17]. Since then the mean-shift algorithm has become well-known in the Computer Science community but not as familiar to the Statistics community. As the implementation of the mean-shift algorithm requires a choice of the kernel function, the Gaussian kernel is very often used in practice [17][18][19]. There are other clustering algorithms that can be viewed as some version of the mean-shift algorithm. For example, Cheng [16] showed that the k-means algorithm is some limit of the non-blurring mean-shift algorithm. Yang and Wu used a total similarity objective function to derive a similarity-based clustering method (SCM) [20], which is a non-blurring mean-shift type clustering algorithm. Although the non-blurring mean-shift is more popular in image processing, Chen et al. [21] reported that the blurring proc...

show abstract

Section: Discussionmentioning

confidence: 99%

On the strengths of the self-updating process clustering algorithm

Shiu

Chen

2015

Journal of Statistical Computation and Simulation

Self Cite

View full text Add to dashboard Cite

show abstract

“…In our empirical experience with MPCA (Chen et al, 2014;Hung et al, 2012), often a few iterations will be sufficient. Treating r, r 1 + Á Á Á + r D and the number of iterations in MPCA as fixed numbers, the computational complexity for Rank-r…”

Section: Computational Complexitymentioning

confidence: 99%

“…The tensor‐structured PCA consists of two steps, where the first step is an MPCA on the original tensor data and the second step is a PCA on the vectorized core tensor. See, for example, Chen et al () for a tensor‐structured PCA applied to cryo‐EM images clustering.…”

Section: Unsupervised Tensor Dimension Reductionmentioning

confidence: 99%

“…In our empirical experience with MPCA (Chen et al, ; Hung et al, ), often a few iterations will be sufficient. Treating r , r 1 + ⋯ + r D and the number of iterations in MPCA as fixed numbers, the computational complexity for Rank‐ r PCA is

O ((), n \prod_{k = 1}^{D} p_{k}^{2})

, while the computational complexity for Rank‐( r 1 , …, r D ) MPCA is

O ((), n \prod_{k = 1}^{D} p_{k})

.…”

Section: Unsupervised Tensor Dimension Reductionmentioning

confidence: 99%

See 1 more Smart Citation

Tensor decomposition for dimension reduction

Cheng

Huang

2019

WIREs Computational Stats

View full text Add to dashboard Cite

Tensor data are data with multiway array structure. They are often very high dimensional and are routinely encountered in many scientific fields. Dimension reduction is the technique of reducing the number of underlying variables for compressed data representation and for model parsimony. Tensor dimension reduction aims for reducing the tensor data dimension while keeping data's tensor structure. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Deep Learning Statistical and Graphical Methods of Data Analysis > Dimension Reduction Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods Statistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data

show abstract

“…Fujisawa and Eguchi (2008) revisited these old works and constructed a γ -cross entropy robust criterion, assuming under a proper γ (≥ 0), the outliers go to the tails of density power and thus do not contribute much in the population estimation. Recently, the γ -cross entropy criterion has gained much attention and there are a series of variant works including robust estimation using an unnormalized model (Kanamori and Fujisawa, 2015), robust clustering (Chen et al, 2014), Gaussian graphical modeling (Katayama et al, 2018; Miyamura and Kano, 2006), and others.…”

Section: Introductionmentioning

confidence: 99%

AdaReg: Data Adaptive Robust Estimation in Linear Regression with Application in GTEx Gene Expressions

Wang

Jiang

Snyder

2019

Preprint

View full text Add to dashboard Cite

With the development of high-throughput RNA sequencing (RNA-seq) technology, the Genotype Tissue-Expression (GTEx) project (Consortium et al., 2015) generated a valuable resource of gene expression data from more than 11,000 samples. The large-scale data set is a powerful resource for understanding the human transcriptome. However, the technical variation, sequencing background noise and unknown factors make the statistical analysis challenging. To eliminate the possibility that outliers might affect the estimation of population distribution, we need a more robust estimation method, a method that will adapt to heterogeneous genes and further optimize the estimate for each gene. We followed the approach of the robust estimation based on γ-density-power-weight (Fujisawa and Eguchi, 2008; Windham, 1995), where γ is the exponent of density weight which controls the balance between bias and variance. As far as we know, our work is the first to propose a procedure to tune the parameter γ to balance the bias-variance trade-off under the mixture distributions. We constructed a robust likelihood criterion based on weighted densities in the mixture model of Gaussian population distribution mixed with unknown outlier distribution, and developed a data-adaptive γ-selection procedure embedded into the robust estimation. We provided a heuristic analysis on the selection criterion and found that our practical selection trend under various γ’s in average performance has similar capability to capture minimizer γ as the inestimable Mean Squared Error (MSE) trend from our simulation studies under a series of settings. Our data-adaptive robustifying procedure in the linear regression problem (AdaReg) shows a significant advantage in both simulation studies and real data application of heart samples from the GTEx project compared to the fixed γ procedure and other robust methods. This paper discusses some limitations of this method, and future work.

show abstract

$\gamma$-SUP: A clustering algorithm for cryo-electron microscopy images of asymmetric particles

Cited by 19 publications

References 47 publications

On the strengths of the self-updating process clustering algorithm

On the strengths of the self-updating process clustering algorithm

Tensor decomposition for dimension reduction

AdaReg: Data Adaptive Robust Estimation in Linear Regression with Application in GTEx Gene Expressions

Contact Info

Product

Resources

About