Dimensionality reduction is widely used in machine learning and big data analytics since it helps to analyze and to visualize large, high-dimensional datasets. In particular, it can considerably help to perform tasks like data clustering and classification. Recently, embedding methods have emerged as a promising direction for improving clustering accuracy. They can preserve the local structure and simultaneously reveal the global structure of data, thereby reasonably improving clustering performance. In this paper, we investigate how to improve the performance of several clustering algorithms using one of the most successful embedding techniques: Uniform Manifold Approximation and Projection or UMAP. This technique has recently been proposed as a manifold learning technique for dimensionality reduction. It is based on Riemannian geometry and algebraic topology. Our main hypothesis is that UMAP would permit to find the best clusterable embedding manifold, and therefore, we applied it as a preprocessing step before performing clustering. We compare the results of many well-known clustering algorithms such ask-means, HDBSCAN, GMM and Agglomerative Hierarchical Clustering when they operate on the low-dimension feature space yielded by UMAP. A series of experiments on several image datasets demonstrate that the proposed method allows each of the clustering algorithms studied to improve its performance on each dataset considered. Based on Accuracy measure, the improvement can reach a remarkable rate of 60%.
Object detection is considered as the cornerstone of many modern applications such as Drone vision and Self-driven cars. Object detectors, mainly those which are based on Convolutional Neural Networks (CNNs) have received great attention from many researchers because they were able to yield remarkable results. However, most of them fail when it comes to detecting overlapping and small objects in images. There are two families of detectors: the first family detects more objects but with imprecise bounding boxes, while those of the second family do the opposite. In this paper, we propose a solution to this problem by combining the two families, in a way similar to classifier combination. Our solution has been validated through the combination of two famous detectors, Faster R-CNN which detects more objects and YOLO which produces accurate bounding boxes. However, it is more general and it can be applied to other detectors. The evaluation of our method has been applied to the PASCAL VOC dataset and it gave promising results.
In this paper, we introduce a novel algorithm that unifies manifold embedding and clustering (UEC) which efficiently predicts clustering assignments of the high dimensional data points in a new embedding space. The algorithm is based on a bi-objective optimisation problem combining embedding and clustering loss functions. Such original formulation will allow to simultaneously preserve the original structure of the data in the embedding space and produce better clustering assignments. The experimental results using a number of real-world datasets show that UEC is competitive with the state-of-art clustering methods.
In this paper, we introduce a novel algorithm that unifies manifold embedding and clustering (UEC) which efficiently predicts clustering assignments of the high dimensional data points in a new embedding space. The algorithm is based on a bi-objective optimisation problem combining embedding and clustering loss functions. Such original formulation will allow to simultaneously preserve the original structure of the data in the embedding space and produce better clustering assignments. The experimental results using a number of real-world datasets show that UEC is competitive with the state-of-art clustering methods.
The advent of the COVID-19 pandemic caused by the Sars-CoV2 virus has caused serious damage in different areas. This has prompted thousands of researchers from different disciplines (biology, medicine, artificial intelligence, economics, etc.) to publish a very large number of scientific articles in a very short period, to answer questions related to this pandemic. This abundance of literature, however, raised another problem. It has indeed become extremely difficult for a researcher or a decision-maker to stay up to date with the latest scientific advances or to locate scientific articles related to a specific aspect of this pandemic. In this paper, we present an intelligent tool based on Machine learning, which automatically organizes a large dataset of Covid-19 related scientific literature and visualizes them in a way that helps these people navigating easily through this dataset and locating the sought documents easily. The documents are first pre-processed and transformed into numerical features. Then, those features are passed through a deep denoising autoencoder followed by Uniform Manifold Approximation and Projection technique (UMAP) to reduce their dimensionality into a 2D space. The projected data are then clustered with Agglomerative Clustering Algorithm. This is followed by a topic modeling step which we performed using Latent Dirichlet Allocation (LDA), in order to assign a label to each cluster. Finally, the documents are visualized to the user in an interactive interface that we developed. The experiments we conducted proved that our tool is efficient and useful.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.