Voronoi Graph Traversal in High Dimensions with Applications to Topological Data Analysis and Piecewise Linear Interpolation

Polianskii, Vladislav; Pokorny, Florian T.

doi:10.1145/3394486.3403266

Cited by 4 publications

(2 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, the prototypes (cluster centers) of codebook tessellate the feature space into Voronoi cells. Then, histogram computation approximates a probability distribution function in the same way as the nonparametric histogram [12,28,67]. That is, the BoP representation reflects the distribution of a dataset in the feature space.…”

Section: Discussionmentioning

confidence: 99%

A Bag-of-Prototypes Representation for Dataset-Level Applications

Weijie¹,

Deng²,

Gedeon³

et al. 2023

Preprint

View full text Add to dashboard Cite

This work investigates dataset vectorization for two dataset-level tasks: assessing training set suitability and test set difficulty. The former measures how suitable a training set is for a target domain, while the latter studies how challenging a test set is for a learned model. Central to the two tasks is measuring the underlying relationship between datasets. This needs a desirable dataset vectorization scheme, which should preserve as much discriminative dataset information as possible so that the distance between the resulting dataset vectors can reflect dataset-to-dataset similarity. To this end, we propose a bag-of-prototypes (BoP) dataset representation that extends the image-level bag consisting of patch descriptors to dataset-level bag consisting of semantic prototypes. Specifically, we develop a codebook consisting of K prototypes clustered from a reference dataset. Given a dataset to be encoded, we quantize each of its image features to a certain prototype in the codebook and obtain a K-dimensional histogram. Without assuming access to dataset labels, the BoP representation provides a rich characterization of the dataset semantic distribution. Furthermore, BoP representations cooperate well with Jensen-Shannon divergence for measuring dataset-todataset similarity. Although very simple, BoP consistently shows its advantage over existing representations on a series of benchmarks for two dataset-level tasks.

show abstract

Section: Discussionmentioning

confidence: 99%

A Bag-of-Prototypes Representation for Dataset-Level Applications

Weijie¹,

Deng²,

Gedeon³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Many of these algorithms are based on the efficient and well-tested convex hull algorithm "qhull" (http: //www.qhull.org (accessed on 18 September 2022) [52]). For larger number of features (d > 20), there are approximations for the Voronoi cells or their equivalent, the Delaunay graph (e.g., [53]).…”

Section: Plausible Bayesmentioning

confidence: 99%

Robust Classification Using Posterior Probability Threshold Computation Followed by Voronoi Cell Based Class Assignment Circumventing Pitfalls of Bayesian Analysis of Biomedical Data

Ultsch

Lötsch

2022

IJMS

View full text Add to dashboard Cite

Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or “omics” approaches, as well as in machine learning (ML), artificial neural networks, and “big data” applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as “uncertain” class membership. The boundary for low probabilities of class assignment (threshold ε) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < ε). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1–10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.

show abstract

Skeleton Clustering: Dimension-Free Density-Aided Clustering

Wei

Chen

2023

Journal of the American Statistical Association

View full text Add to dashboard Cite

A Computational ComplexityKnots construction. The first step of skeleton clustering is choosing knots, and, in this work, we take overfitting k-means as the default method. The k-means algorithm of Hartigan and Wong (Hartigan and Wong, 1979) has time complexity O(ndkI), where n is the number of points, d is the dimension of the data, k is the number of clusters for k-means, and I is the number of iterations needed for convergence. When using overfitting k-means to choose knots, the reference rule is k = √ n, and hence the complexity is O(n 3/2 dI). This is a time-consuming step of our clustering framework, and the complexity increases linearly with d. Therefore, preprocessing the data with dimension reduction techniques or using subject knowledge to choose knots can be helpful to speed up this process.Edges construction. For the edge construction step, we approximate the Delaunay Triangulation with DT (C) by looking at the 2-NN neighborhoods (the Voronoi Density regions in 3.1 ). Hence the main computational task for our edge construction step is the 2-nearest knot search. We used the k-d tree algorithm for this purpose, which gives the worst-case complexity of O(ndk (1−1/d) ). Notably, the computation complexity at this step is at the worst linear in d, which is a much better rate than computing the exact Delaunay Triangulation (exponential dependence on d), and our empirical studies have illustrated the effectiveness of such approximation.Edge weight construction: VD. Next, we consider the computation complexity of the different edge weights measurements. For the VD, its numerator can be computed directly from the 2-NN search when constructing the edges and hence no additional computation is needed. The denominators are pairwise distances between knots and can be computed with the worst-case complexity of O(dk 2 ) because the number of nonzero edges is less than k(k−1) 2 . With k = √ n, we have the total time complexity of computing the VD to be O(nd). Edge weight construction: FD. For the Face density, we calculate the projected KDE at the middle point for each pair of neighboring Voronoi cells. The projection of one data point onto one central line can be done by matrix multiplication with complexity O(d). Recall that we only use data points in local Voronoi cells for FD calculation, and the local sample size would be at n loc = O( √ n) under the conditions in Section 4 and the reference rule k = [ √ n]. Together it takes O(d √ n) to calculate the projected data for one edge. With the projected data, KDE calculation has a time complexity O(c log c) where c = max j̸ =ℓ {n j +n ℓ } for any pair of knot indexes j, ℓ. Again we have c = O(n/k) = O(√ n) under the previously mentioned conditions. We need to do KDE for each edge in the skeleton, which gives the overall time complexity of FD weights to O(kEdge weight construction: TD. For Tube density, we similarly perform a projected KDE for each edge. Let η be the maximum number of points in a tube region η = max j,ℓ |{X i : ∥Π jℓ (X i ) − X i ∥ ≤ R}|, the data pro...

show abstract

Voronoi Graph Traversal in High Dimensions with Applications to Topological Data Analysis and Piecewise Linear Interpolation

Cited by 4 publications

References 22 publications

A Bag-of-Prototypes Representation for Dataset-Level Applications

A Bag-of-Prototypes Representation for Dataset-Level Applications

Robust Classification Using Posterior Probability Threshold Computation Followed by Voronoi Cell Based Class Assignment Circumventing Pitfalls of Bayesian Analysis of Biomedical Data

Skeleton Clustering: Dimension-Free Density-Aided Clustering

Contact Info

Product

Resources

About