Ben Karsin scite author profile

Ben Karsin

5Publications

57Citation Statements Received

157Citation Statements Given

How they've been cited

How they cite others

123

154

Affiliations

Nvidia (United States), University of Hawaiʻi at Mānoa, Université Libre de Bruxelles

Publications

Order By: Most citations

GPU Accelerated Self-Join for the Distance Similarity Metric

Gowanlock

Karsin

2018

View full text Add to dashboard Cite

The self-join finds all objects in a dataset within a threshold of each other defined by a similarity metric. As such, the self-join is a building block for the field of databases and data mining, and is employed in Big Data applications. In this paper, we advance a GPU-efficient algorithm for the similarity self-join that uses the Euclidean distance metric. The search-and-refine strategy is an efficient approach for low dimensionality datasets, as index searches degrade with increasing dimension (i.e., the curse of dimensionality). Thus, we target the low dimensionality problem, and compare our GPU self-join to a search-and-refine implementation, and a state-of-the-art parallel algorithm. In low dimensionality, there are several unique challenges associated with efficiently solving the self-join problem on the GPU. Low dimensional data often results in higher data densities, causing a significant number of distance calculations and a large result set. As dimensionality increases, index searches become increasingly exhaustive, forming a performance bottleneck. We advance several techniques to overcome these challenges using the GPU. The techniques we propose include a GPU-efficient index that employs a bounded search, a batching scheme to accommodate large result set sizes, and a reduction in distance calculations through duplicate search removal. Our GPU self-join outperforms both search-and-refine and state-of-the-art algorithms.

show abstract

Beyond Binary Search: Parallel In-Place Construction of Implicit Search Tree Layouts

Berney

Casanova

Higuchi

et al. 2018

View full text Add to dashboard Cite

We present parallel algorithms to efficiently permute a sorted array into the level-order binary search tree (BST), level-order B-tree (B-tree), and van Emde Boas (vEB) layouts in-place. We analytically determine the complexity of our algorithms and empirically measure their performance. Given N elements and P processors, our fastest algorithms have a parallel runtime of O N P for the BST layout, O N P + log B N log B N for the B-tree layout, and O N P log log N for the vEB layout using the CREW Parallel Random Access Machine (PRAM) model. Experimental results indicate that on both CPU and GPU architectures, the B-tree layout provides the best query performance. However, when considering the total time to permute the data using our algorithms and to perform a series of search queries, the vEB layout provides the best performance on the CPU. We show that given an input of N=500M 64-bit integers, the benefits of query performance (compared to binary search) outweigh the cost of in-place permutation using our algorithms when performing at least 5M queries (1% of N) and 27M queries (6% of N), on our CPU and GPU platforms, respectively.

show abstract

GPU-Accelerated Similarity Self-Join for Multi-Dimensional Data

Gowanlock

Karsin

2019

View full text Add to dashboard Cite

e self-join nds all objects in a dataset that are within a search distance, ϵ , of each other; therefore, the self-join is a building block of many algorithms. We advance a GPU-accelerated self-join algorithm targeted towards high dimensional data. e massive parallelism a orded by the GPU and high aggregate memory bandwidth makes the architecture well-suited for data-intensive workloads. We leverage a grid-based, GPU-tailored index to perform range queries. We propose the following optimizations: (i) a trade-o between candidate set ltering and index search overhead by exploiting properties of the index; (ii) reordering the data based on variance in each dimension to improve the ltering power of the index; and (iii) a pruning method for reducing the number of expensive distance calculations. Across most scenarios on real-world and synthetic datasets, our algorithm outperforms the parallel state-of-the-art approach. Exascale systems are converging on heterogeneous distributed-memory architectures. We show that an entity partitioning method can be utilized to achieve a balanced workload, and thus good scalability for multi-GPU or distributed-memory self-joins.us, many large-scale data analytics applications will rely on GPU-e cient algorithms, including the distance similarity self-join for high dimensional data -the subject of this work. is paper makes the following novel contributions:• Leveraging an e cient indexing scheme for the GPU, we exploit the trade-o between index ltering power and search cost to improve the overall performance of searching high dimensional feature spaces.• We improve the ltering power of the index by reordering the data in each dimension using statistical properties of the data distribution. We show that this is particularly important when exploiting the trade-o outlined above.• We mitigate the performance cost of reducing index ltering power by proposing a technique that prunes the candidate set by comparing points based on an un-indexed dimension.• We show that on the worst-case data distribution for our approach, we achieve signi cantly be er performance than the state-of-the-art on the same scenario. is suggests that the performance of the GPU-accelerated self-join is resilient to the data distribution, making the approach well-suited for many application scenarios.• We evaluate our approach on 5 real-world and 3 synthetic datasets and show that our GPU accelerated self-join outperforms the state-of-the-art parallel algorithm in the literature.• e self-join is an expensive operation. We show initial insights into the scalability of the self-join on multi-GPU and distributed-memory systems, and demonstrate that an entity partitioning strategy can be used to achieve good load balancing. e paper is outlined as follows: Section 2 provides background material, Section 3 formalizes the problem and discusses previous work that we employ, Section 4 presents the novel methods we use to improve high dimensional self-join performance, Section 5 illustrates our performance results, Section 6 dis...

show abstract

Efficient Batched Predecessor Search in Shared Memory on GPUs

Karsin

Casanova

Sitchinava

2015

View full text Add to dashboard Cite

An Efficient Algorithm for the 1D Total Visibility-Index Problem and Its Parallelization

Afshani

Berg

Casanova

et al. 2018

ACM J. Exp. Algorithmics

View full text Add to dashboard Cite

Let T be a terrain and P be a set of points on its surface. An important problem in Geographic Information Science (GIS) is computing the visibility index of a point p on P , that is, the number of points in P that are visible from p . The total visibility-index problem asks for the visibility index of every point in P . We present the first subquadratic-time algorithm to solve the one-dimensional total-visibility-index problem. Our algorithm uses a geometric dualization technique to reduce the problem to a set of instances of the red--blue line segment intersection counting problem, allowing us to find the total visibility-index in O ( n log 2 n ) time. We implement a naive O ( n 2 ) approach and four variations of our algorithm: one that uses an existing red--blue line segment intersection counting algorithm and three new approaches that leverage features specific to our problem. Two of our implementations allow for parallel execution, requiring O (log 2 n ) time and O ( n log 2 n ) work in the CREW PRAM model. We present experimental results for both serial and parallel implementations on synthetic and real-world datasets using two hardware platforms. Results show that all variants of our algorithm outperform the naive approach by several orders of magnitude. Furthermore, we show that our special-case red--blue line segment intersection counting implementations out-perform the existing general-case solution by up to a factor 10. Our fastest parallel implementation is able to process a terrain of more than 100 million vertices in under 3 minutes, achieving up to 85% parallel efficiency using 16 cores.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ben Karsin

GPU Accelerated Self-Join for the Distance Similarity Metric

Beyond Binary Search: Parallel In-Place Construction of Implicit Search Tree Layouts

GPU-Accelerated Similarity Self-Join for Multi-Dimensional Data

Efficient Batched Predecessor Search in Shared Memory on GPUs

An Efficient Algorithm for the 1D Total Visibility-Index Problem and Its Parallelization

Contact Info

Product

Resources

About