This paper presents novel techniques for improving the performance of a multi-way spectral clustering framework (Govindu Lerman, 2007, preprint in the supplementary webpage) for segmenting affine subspaces. Specifically, it suggests an iterative sampling procedure to improve the uniform sampling strategy, an automatic scheme of inferring the tuning parameter from the data, a precise initialization procedure for K-means, as well as a simple strategy for isolating outliers. The resulting algorithm, Spectral Curvature Clustering (SCC), requires only linear storage and takes linear running time in the size of the data. It is supported by theory which both justifies its successful performance and guides our practical choices. We compare it with other existing methods on a few artificial instances of affine subspaces. Application of the algorithm to several real-world problems is also discussed.
The effect of metabolic stress on the bone marrow microenvironment is poorly defined. We show that high-fat diet (HFD) decreased long-term Lin(-)Sca-1(+)c-Kit(+) (LSK) stem cells and shifted lymphoid to myeloid cell differentiation. Bone marrow niche function was impaired after HFD as shown by poor reconstitution of hematopoietic stem cells. HFD led to robust activation of PPAR纬2, which impaired osteoblastogenesis while enhancing bone marrow adipogenesis. At the same time, expression of genes such as Jag-1, SDF-1, and IL-7 forming the bone marrow niche was highly suppressed after HFD. Moreover, structural changes of microbiota were associated to HFD-induced bone marrow changes. Antibiotic treatment partially rescued HFD-mediated effects on the bone marrow niche, while transplantation of stools from HFD mice could transfer the effect聽to normal mice. These findings show that metabolic stress affects the bone marrow niche by alterations of gut microbiota and osteoblast-adipocyte homeostasis.
In the context of clustering, we assume a generative model where each cluster is the result of sampling points in the neighborhood of an embedded smooth surface; the sample may be contaminated with outliers, which are modeled as points sampled in space away from the clusters. We consider a prototype for a higher-order spectral clustering method based on the residual from a local linear approximation. We obtain theoretical guarantees for this algorithm and show that, in terms of both separation and robustness to outliers, it outperforms the standard spectral clustering algorithm (based on pairwise distances) of Ng, Jordan and Weiss (NIPS '01). The optimal choice for some of the tuning parameters depends on the dimension and thickness of the clusters. We provide estimators that come close enough for our theoretical purposes. We also discuss the cases of clusters of mixed dimensions and of clusters that are generated from smoother surfaces. In our experiments, this algorithm is shown to outperform pairwise spectral clustering on both simulated and real data
Data sets are often modeled as samples from a probability distribution in R D , for D large. It is often assumed that the data has some interesting low-dimensional structure, for example that of a d-dimensional manifold M, with d much smaller than D. When M is simply a linear subspace, one may exploit this assumption for encoding efficiently the data by projecting onto a dictionary of d vectors in R D (for example found by SVD), at a cost (n + D)d for n data points. When M is nonlinear, there are no "explicit" and algorithmically efficient constructions of dictionaries that achieve a similar efficiency: typically one uses either random dictionaries, or dictionaries obtained by black-box global optimization. In this paper we construct data-dependent multi-scale dictionaries that aim at efficiently encoding and manipulating the data. Their construction is fast, and so are the algorithms that map data points to dictionary coefficients and vice versa, in contrast with L 1 -type sparsity-seeking algorithms, but alike adaptive nonlinear approximation in classical multiscale analysis. In addition, data points are guaranteed to have a compressible representation in terms of the dictionary, depending on the assumptions on the geometry of the underlying probability distribution.
The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curvature Clustering (TSCC) algorithm for solving the HLM problem and provides careful analysis to justify it. The TSCC algorithm is practically a combination of Govindu's multi-way spectral clustering framework (CVPR 2005) and Ng et al.'s spectral clustering algorithm (NIPS 2001). The main result of this paper states that if the given data is sampled from a mixture of distributions concentrated around affine subspaces, then with high sampling probability the TSCC algorithm segments well the different underlying clusters. The goodness of clustering depends on the within-cluster errors, the between-clusters interaction, and a tuning parameter applied by TSCC. The proof also provides new insights for the analysis of Ng et al. (NIPS 2001).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.