Lizard Brain: Tackling Locally Low-Dimensional Yet Globally Complex Organization of Multi-Dimensional Datasets

Bac, Jonathan; Zinovyev, Andreï

doi:10.3389/fnbot.2019.00110

Cited by 19 publications

(29 citation statements)

References 72 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this particular case, applying non-linear dimensionality reduction (e.g., tSNE) could make the ‘hidden’ branch more visible in 2D at the cost of distorting the underlying data geometry. Nevertheless, this situation can be reproduced with any data dimensionality technique (for examples, see [ 56 ]).…”

Section: Resultsmentioning

confidence: 99%

“…In order to compare these algorithms with ElPiGraph, we used previously published LizardBrain generator of noisy branching data point clouds [ 56 ]. Briefly, it generates data points in a unit m -dimensional hypercube around a set of non-linear (e.g., parabolic) branches, such that each next branch starts from a randomly selected point on one of the previously generated branches in a random direction.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph

Albergante

Mirkes

Bac

et al. 2020

Entropy

Self Cite

View full text Add to dashboard Cite

179)Large datasets represented by multidimensional data point clouds often possess nontrivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph

Albergante

Mirkes

Bac

et al. 2020

Entropy

Self Cite

View full text Add to dashboard Cite

show abstract

“…All these methods use an object embedded in the data space. They are called Injective Methods [ 68 ]. In addition, a family of Projective Methods was developed.…”

Section: Dimension Estimationmentioning

confidence: 99%

“…These methods do not construct a data approximator, but project the dataspace onto a space of lower dimension with preservation of similarity or dissimilarity of objects. A brief review of modern injective and projective methods can be found in [ 68 ].…”

Section: Dimension Estimationmentioning

confidence: 99%

See 1 more Smart Citation

Fractional Norms and Quasinorms Do Not Help to Overcome the Curse of Dimensionality

Mirkes

Allohibi

Gorban

2020

Entropy

View full text Add to dashboard Cite

The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using the Manhattan distance and even fractional lp quasinorms (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. It is illustrated that fractional quasinorms have a greater relative contrast and coefficient of variation than the Euclidean norm l2, but it is shown that this difference decays with increasing space dimension. It has been demonstrated that the concentration of distances shows qualitatively the same behaviour for all tested norms and quasinorms. It is shown that a greater relative contrast does not mean a better classification quality. It was revealed that for different databases the best (worst) performance was achieved under different norms (quasinorms). A systematic comparison shows that the difference in the performance of kNN classifiers for lp at p = 0.5, 1, and 2 is statistically insignificant. Analysis of curse and blessing of dimensionality requires careful definition of data dimensionality that rarely coincides with the number of attributes. We systematically examined several intrinsic dimensions of the data.

show abstract

COVID-19 lockdown introduces human mobility pattern changes for both Guangdong-Hong Kong-Macao greater bay area and the San Francisco bay area

Zhong

Zhou

Gao

et al. 2022

International Journal of Applied Earth Observation and Geoinfor

View full text Add to dashboard Cite

Lizard Brain: Tackling Locally Low-Dimensional Yet Globally Complex Organization of Multi-Dimensional Datasets

Cited by 19 publications

References 72 publications

Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph

Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph

Fractional Norms and Quasinorms Do Not Help to Overcome the Curse of Dimensionality

COVID-19 lockdown introduces human mobility pattern changes for both Guangdong-Hong Kong-Macao greater bay area and the San Francisco bay area

Contact Info

Product

Resources

About