Claudio Ceruti scite author profile

When dealing with datasets comprising high-dimensional points, it is usually advantageous to discover some data structure. A fundamental information needed to this aim is the minimum number of parameters required to describe the data while minimizing the information loss. This number, usually called intrinsic dimension, can be interpreted as the dimension of the manifold from which the input data are supposed to be drawn. Due to its usefulness in many theoretical and practical problems, in the last decades the concept of intrinsic dimension has gained considerable attention in the scientific community, motivating the large number of intrinsic dimensionality estimators proposed in the literature. However, the problem is still open since most techniques cannot efficiently deal with datasets drawn from manifolds of high intrinsic dimension and nonlinearly embedded in higher dimensional spaces. This paper surveys some of the most interesting, widespread used, and advanced state-of-the-art methodologies. Unfortunately, since no benchmark database exists in this research field, an objective comparison among different techniques is not possible. Consequently, we suggest a benchmark framework and apply it to comparatively evaluate relevant stateof-the-art estimators.

show abstract

DANCo: An intrinsic dimensionality estimator exploiting angle and norm concentration

Ceruti

Bassis

Rozza

et al. 2014

Pattern Recognition

View full text Add to dashboard Cite

Novel high intrinsic dimensionality estimators

et al. 2012

View full text Add to dashboard Cite

Recently, a great deal of research work has been devoted to the development of algorithms to estimate the intrinsic dimensionality (id) of a given dataset, that is the minimum number of parameters needed to represent the data without information loss. id estimation is important for the following reasons: the capacity and the generalization capability of discriminant methods depend on it; id is a necessary information for any dimensionality reduction technique; in neural network design the number of hidden units in the encoding middle layer should be chosen according to the id of data; the id value is strongly related to the model order in a time series, that is crucial to obtain reliable time series predictions.Although many estimation techniques have been proposed in the literature, most of them fail on noisy data, or compute underestimated values when the id is sufficiently high. In this paper, after reviewing some of the most important id estimators related to our work, we provide a theoretical motivation of the bias that causes the underestimation effect, and we present two id estimators based on the statistical properties of manifold neighborhoods, which have been developed in order to reduce this effect. We exhaustively evaluate the proposed techniques on synthetic and real datasets, by employing an objective evaluation measure to compare their performance with those achieved by state of the art algorithms; the results show that the proposed methods are promising, and produce reliable estimates also in the difficult case of datasets drawn from non-linearly embedded manifolds, characterized by high id.

show abstract

Minimum Neighbor Distance Estimators of Intrinsic Dimension

Lombardi

Rozza

Ceruti

et al. 2011

View full text Add to dashboard Cite

Abstract. Most of the machine learning techniques suffer the "curse of dimensionality" effect when applied to high dimensional data. To face this limitation, a common preprocessing step consists in employing a dimensionality reduction technique. In literature, a great deal of research work has been devoted to the development of algorithms performing this task. Often, these techniques require as parameter the number of dimensions to be retained; to this aim, they need to estimate the "intrinsic dimensionality" of the given dataset, which refers to the minimum number of degrees of freedom needed to capture all the information carried by the data. Although many estimation techniques have been proposed, most of them fail in case of noisy data or when the intrinsic dimensionality is too high. In this paper we present a family of estimators based on the probability density function of the normalized nearest neighbor distance. We evaluate the proposed techniques on both synthetic and real datasets comparing their performances with those obtained by state of the art algorithms; the achieved results prove that the proposed methods are promising.

show abstract

Local Intrinsic Dimensionality Based Features for Clustering

Campadelli

Casiraghi

Ceruti

et al. 2013

View full text Add to dashboard Cite

One of the fundamental tasks of unsupervised learning is dataset clustering, to partition the input dataset into clusters composed by somehow "similar" objects that "differ" from the objects belonging to other classes. To this end, in this paper we assume that the different clusters are drawn from different, possibly intersecting, geometrical structures represented by manifolds embedded into a possibly higher dimensional space. Under these assumptions, and considering that each manifold is typified by a geometrical structure characterized by its intrinsic dimensionality, which (possibly) differs from the intrinsic dimensionalities of other manifolds, we code the input data by means of local intrinsic dimensionality estimates and features related to them, and we subsequently apply simple and basic clustering algorithms, since our interest is specifically aimed at assessing the discriminative power of the proposed features. Indeed, their encouraging discriminative quality is shown by a feature relevance test, by the clustering results achieved on both synthetic and real datasets, and by their comparison to those obtained by related and classical state-of-the-art clustering approaches.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Claudio Ceruti

Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework

DANCo: An intrinsic dimensionality estimator exploiting angle and norm concentration

Novel high intrinsic dimensionality estimators

Minimum Neighbor Distance Estimators of Intrinsic Dimension

Local Intrinsic Dimensionality Based Features for Clustering

Contact Info

Product

Resources

About