Zhenyue Zhang scite author profile

Abstract. Nonlinear manifold learning from unorganized data points is a very challenging unsupervised learning and data visualization problem with a great variety of applications. In this paper we present a new algorithm for manifold learning and nonlinear dimension reduction. Based on a set of unorganized data points sampled with noise from the manifold, we represent the local geometry of the manifold using tangent spaces learned by fitting an affine subspace in a neighborhood of each data point. Those tangent spaces are aligned to give the internal global coordinates of the data points with respect to the underlying manifold by way of a partial eigendecomposition of the neighborhood connection matrix. We present a careful error analysis of our algorithm and show that the reconstruction errors are of second-order accuracy. We illustrate our algorithm using curves and surfaces both in 2D/3D and higher dimensional Euclidean spaces, and 64-by-64 pixel face images with various pose and lighting conditions. We also address several theoretical and algorithmic issues for further research and improvements.Keywords: nonlinear dimension reduction, principal manifold, tangent space, subspace alignment, eigenvalue decomposition, perturbation analysis AMS subject classifications. 15A18, 15A23, 65F15, 65F501. Introduction. Many high-dimensional data in real-world applications can be modeled as data points lying close to a low-dimensional nonlinear manifold. Discovering the structure of the manifold from a set of data points sampled from the manifold possibly with noise represents a very challenging unsupervised learning problem [2,3,4,8,9,10,13,14,15,17,18]. The discovered low-dimensional structures can be further used for classification, clustering, outlier detection and data visualization. Example low-dimensional manifolds embedded in high-dimensional input spaces include image vectors representing the same 3D objects under different camera views and lighting conditions, a set of document vectors in a text corpus dealing with a specific topic, and a set of 0-1 vectors encoding the test results on a set of multiple choice questions for a group of students [13,14,18]. The key observation is that the dimensions of the embedding spaces can be very high (e.g., the number of pixels for each images in the image collection, the number of terms (words and/or phrases) in the vocabulary of the text corpus, or the number of multiple choice questions in the test), the intrinsic dimensionality of the data points, however, are rather limited due to factors such as physical constraints and linguistic correlations. Traditional dimension reduction techniques such as principal component analysis and factor analysis usually work well when the data points lie close to a linear (affine) subspace in the input space [7]. They can not, in general, discover nonlinear structures embedded in the set of data points.Recently, there have been much renewed interests in developing efficient algorithms for constructing nonlinear low-dimensional manifolds f...

show abstract

Nonlinear Dimension Reduction via Local Tangent Space Alignment

Zhang

Zha

2003

163

192

View full text Add to dashboard Cite

Nonlinear manifold learning from unorganized data points is a very challenging unsupervised learning and data visualization problem with a great variety of applications. In this paper we present a new algorithm for manifold learning and nonlinear dimension reduction. Based on a set of unorganized data points sampled with noise from the manifold, we represent the local geometry of the manifold using tangent spaces learned by fitting an affine subspace in a neighborhood of each data point. Those tangent spaces are aligned to give the internal global coordinates of the data points with respect to the underlying manifold by way of a partial eigendecomposition of the neighborhood connection matrix. We present a careful error analysis of our algorithm and show that the reconstruction errors are of second-order accuracy. We illustrate our algorithm using curves and surfaces both in 2D/3D and higher dimensional Euclidean spaces, and 64-by-64 pixel face images with various pose and lighting conditions. We also address several theoretical and algorithmic issues for further research and improvements. 1. Introduction. Many high-dimensional data in real-world applications can be modeled as data points lying close to a low-dimensional nonlinear manifold. Discovering the structure of the manifold from a set of data points sampled from the mani-fold possibly with noise represents a very challenging unsupervised learning problem [2, 3, 4, 8, 9, 10, 13, 14, 15, 17, 18]. The discovered low-dimensional structures can be further used for classification, clustering, outlier detection and data visualization. Example low-dimensional manifolds embedded in high-dimensional input spaces include image vectors representing the same 3D objects under different camera views and lighting conditions, a set of document vectors in a text corpus dealing with a specific topic, and a set of 0-1 vectors encoding the test results on a set of multiple choice questions for a group of students [13, 14, 18]. The key observation is that the dimensions of the embedding spaces can be very high (e.g., the number of pixels for each images in the image collection, the number of terms (words and/or phrases) in the vocabulary of the text corpus, or the number of multiple choice questions in the test), the intrinsic dimensionality of the data points, however, are rather limited due to factors such as physical constraints and linguistic correlations. Traditional dimension reduction techniques such as principal component analysis and factor analysis usually work well when the data points lie close to a linear (affine) subspace in the input space [7]. They can not, in general, discover nonlinear structures embedded in the set of data points. Recently, there have been much renewed interests in developing efficient algorithms for constructing nonlinear low-dimensional manifolds from sample data points in high-dimensional spaces, emphasizing simple algorithmic implementation and avoid

show abstract

Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment

Zhang¹,

Zha²

2004

SIAM J. Sci. Comput.

1,269

164

View full text Add to dashboard Cite

show abstract

Optimal low-rank approximation to a correlation matrix

Zhang

2003

Linear Algebra and its Applications

View full text Add to dashboard Cite

Large sparse symmetric eigenvalue problems with homogeneous linear constraints: the Lanczos process with inner–outer iterations

Golub

Zhang

Zha

2000

Linear Algebra and its Applications

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhenyue Zhang

Principal manifolds and nonlinear dimensionality reduction via tangent space alignment

Nonlinear Dimension Reduction via Local Tangent Space Alignment

Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment

Optimal low-rank approximation to a correlation matrix

Large sparse symmetric eigenvalue problems with homogeneous linear constraints: the Lanczos process with inner–outer iterations

Contact Info

Product

Resources

About