The self-join finds all objects in a dataset within a threshold of each other defined by a similarity metric. As such, the self-join is a building block for the field of databases and data mining, and is employed in Big Data applications. In this paper, we advance a GPU-efficient algorithm for the similarity self-join that uses the Euclidean distance metric. The search-and-refine strategy is an efficient approach for low dimensionality datasets, as index searches degrade with increasing dimension (i.e., the curse of dimensionality). Thus, we target the low dimensionality problem, and compare our GPU self-join to a search-and-refine implementation, and a state-of-the-art parallel algorithm. In low dimensionality, there are several unique challenges associated with efficiently solving the self-join problem on the GPU. Low dimensional data often results in higher data densities, causing a significant number of distance calculations and a large result set. As dimensionality increases, index searches become increasingly exhaustive, forming a performance bottleneck. We advance several techniques to overcome these challenges using the GPU. The techniques we propose include a GPU-efficient index that employs a bounded search, a batching scheme to accommodate large result set sizes, and a reduction in distance calculations through duplicate search removal. Our GPU self-join outperforms both search-and-refine and state-of-the-art algorithms.
We present parallel algorithms to efficiently permute a sorted array into the level-order binary search tree (BST), level-order B-tree (B-tree), and van Emde Boas (vEB) layouts in-place. We analytically determine the complexity of our algorithms and empirically measure their performance. Given N elements and P processors, our fastest algorithms have a parallel runtime of O N P for the BST layout, O N P + log B N log B N for the B-tree layout, and O N P log log N for the vEB layout using the CREW Parallel Random Access Machine (PRAM) model. Experimental results indicate that on both CPU and GPU architectures, the B-tree layout provides the best query performance. However, when considering the total time to permute the data using our algorithms and to perform a series of search queries, the vEB layout provides the best performance on the CPU. We show that given an input of N=500M 64-bit integers, the benefits of query performance (compared to binary search) outweigh the cost of in-place permutation using our algorithms when performing at least 5M queries (1% of N) and 27M queries (6% of N), on our CPU and GPU platforms, respectively.
e self-join nds all objects in a dataset that are within a search distance, ϵ , of each other; therefore, the self-join is a building block of many algorithms. We advance a GPU-accelerated self-join algorithm targeted towards high dimensional data. e massive parallelism a orded by the GPU and high aggregate memory bandwidth makes the architecture well-suited for data-intensive workloads. We leverage a grid-based, GPU-tailored index to perform range queries. We propose the following optimizations: (i) a trade-o between candidate set ltering and index search overhead by exploiting properties of the index; (ii) reordering the data based on variance in each dimension to improve the ltering power of the index; and (iii) a pruning method for reducing the number of expensive distance calculations. Across most scenarios on real-world and synthetic datasets, our algorithm outperforms the parallel state-of-the-art approach. Exascale systems are converging on heterogeneous distributed-memory architectures. We show that an entity partitioning method can be utilized to achieve a balanced workload, and thus good scalability for multi-GPU or distributed-memory self-joins.us, many large-scale data analytics applications will rely on GPU-e cient algorithms, including the distance similarity self-join for high dimensional data -the subject of this work. is paper makes the following novel contributions:• Leveraging an e cient indexing scheme for the GPU, we exploit the trade-o between index ltering power and search cost to improve the overall performance of searching high dimensional feature spaces.• We improve the ltering power of the index by reordering the data in each dimension using statistical properties of the data distribution. We show that this is particularly important when exploiting the trade-o outlined above.• We mitigate the performance cost of reducing index ltering power by proposing a technique that prunes the candidate set by comparing points based on an un-indexed dimension.• We show that on the worst-case data distribution for our approach, we achieve signi cantly be er performance than the state-of-the-art on the same scenario. is suggests that the performance of the GPU-accelerated self-join is resilient to the data distribution, making the approach well-suited for many application scenarios.• We evaluate our approach on 5 real-world and 3 synthetic datasets and show that our GPU accelerated self-join outperforms the state-of-the-art parallel algorithm in the literature.• e self-join is an expensive operation. We show initial insights into the scalability of the self-join on multi-GPU and distributed-memory systems, and demonstrate that an entity partitioning strategy can be used to achieve good load balancing. e paper is outlined as follows: Section 2 provides background material, Section 3 formalizes the problem and discusses previous work that we employ, Section 4 presents the novel methods we use to improve high dimensional self-join performance, Section 5 illustrates our performance results, Section 6 dis...
Let T be a terrain and P be a set of points on its surface. An important problem in Geographic Information Science (GIS) is computing the visibility index of a point p on P , that is, the number of points in P that are visible from p . The total visibility-index problem asks for the visibility index of every point in P . We present the first subquadratic-time algorithm to solve the one-dimensional total-visibility-index problem. Our algorithm uses a geometric dualization technique to reduce the problem to a set of instances of the red--blue line segment intersection counting problem, allowing us to find the total visibility-index in O ( n log 2 n ) time. We implement a naive O ( n 2 ) approach and four variations of our algorithm: one that uses an existing red--blue line segment intersection counting algorithm and three new approaches that leverage features specific to our problem. Two of our implementations allow for parallel execution, requiring O (log 2 n ) time and O ( n log 2 n ) work in the CREW PRAM model. We present experimental results for both serial and parallel implementations on synthetic and real-world datasets using two hardware platforms. Results show that all variants of our algorithm outperform the naive approach by several orders of magnitude. Furthermore, we show that our special-case red--blue line segment intersection counting implementations out-perform the existing general-case solution by up to a factor 10. Our fastest parallel implementation is able to process a terrain of more than 100 million vertices in under 3 minutes, achieving up to 85% parallel efficiency using 16 cores.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.