In nowadays we observe that there is more data than that can be effectively analyzed. Organizing this data has become one of the biggest problems in Computer Science. Many algorithms have been proposed for this purpose, highlighting those related to the Data Mining area, specifically the automatic document classification (ADC) algorithms. However, these algorithms are still a computational challenge because of the volume of data that needs to be processed. We found in the literature some proposals related to parallelization on graphics processing units (GPUs) to make these algorithms feasible. Still, most of the available parallel solutions ignore specific ADC challenges, such as high dimensionality and heterogeneity in the representation of the documents. In this context, we here present G-KNN, a GPU-based parallel version of the nearest neighbors algorithm (KNN), one of the most widely used ADC algorithms. In our evaluation using five different document collections, we show that the G-KNN can maintain the same classification effectiveness while increasing the efficiency by up to 12x faster than its sequential version using CPU and up to 3x faster than a CPU-based parallel implementation running with 6 threads. Moreover, our algorithm has a much lower memory consumption, enabling its use with large datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.