Dimensionality reduction is widely used in machine learning and big data analytics since it helps to analyze and to visualize large, high-dimensional datasets. In particular, it can considerably help to perform tasks like data clustering and classification. Recently, embedding methods have emerged as a promising direction for improving clustering accuracy. They can preserve the local structure and simultaneously reveal the global structure of data, thereby reasonably improving clustering performance. In this paper, we investigate how to improve the performance of several clustering algorithms using one of the most successful embedding techniques: Uniform Manifold Approximation and Projection or UMAP. This technique has recently been proposed as a manifold learning technique for dimensionality reduction. It is based on Riemannian geometry and algebraic topology. Our main hypothesis is that UMAP would permit to find the best clusterable embedding manifold, and therefore, we applied it as a preprocessing step before performing clustering. We compare the results of many well-known clustering algorithms such ask-means, HDBSCAN, GMM and Agglomerative Hierarchical Clustering when they operate on the low-dimension feature space yielded by UMAP. A series of experiments on several image datasets demonstrate that the proposed method allows each of the clustering algorithms studied to improve its performance on each dataset considered. Based on Accuracy measure, the improvement can reach a remarkable rate of 60%.
In this paper, we introduce a novel algorithm that unifies manifold embedding and clustering (UEC) which efficiently predicts clustering assignments of the high dimensional data points in a new embedding space. The algorithm is based on a bi-objective optimisation problem combining embedding and clustering loss functions. Such original formulation will allow to simultaneously preserve the original structure of the data in the embedding space and produce better clustering assignments. The experimental results using a number of real-world datasets show that UEC is competitive with the state-of-art clustering methods.
We consider the problem of optimal anchor placement for area-based localisation algorithms with the goal of providing cost-effective, simple, and robust positioning in wireless sensor networks. Due to the high complexity of the problem, we propose two placement algorithms based on heuristics. The first, called genetic algorithm anchors placement (GAAP), is based on genetic algorithms meta-heuristic, and the second, called local search anchors placement (LSAP), is based on an intuitive heuristic inspired from search techniques used in quad-trees. For the evaluation of these algorithms, we built a simulation framework, which we made publicly available for the community, and compared their performance against a Brute force (BF) algorithm, and against RND, a random walk-inspired algorithm. Obtained results show that GAAP provides anchor placements that lead to a very high accuracy while keeping execution time drastically smaller compared to LSAP, BF, and RND. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.