GPUTeraSort

Govindaraju, Naga K.; Gray, Jim; Kumar, R. Arockia; Manocha, Dinesh

doi:10.1145/1142473.1142511

Cited by 291 publications

(14 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the increasing demand for sophisticated graphics for video games, computer-aided design (CAD), animation, and other applications is driving the development of more and more powerful graphical processing units (GPUs), which take advantage of data parallelism to render graphics at high speeds. While video cards have been traditionally used only for graphics-intensive applications, they have also been recently leveraged towards scientific-computing problems, such as finite-difference time-domain algorithms [1], sorting algorithms for large databases [2], n-body problems [3], and quantum Monte Carlo methods for chemical applications [4]. In these cases, programmers were required to construct GPU algorithms using a limited set of operations originally intended for computer graphics applications; however, the recent release of graphics card manufacturer NVIDIA's compute unified device architecture (CUDA) development toolkit for some of their high-end graphics cards allows developers to code algorithms in a C-like language [5].…”

Section: Introductionmentioning

confidence: 99%

Accelerating Resolution-of-the-Identity Second-Order Møller−Plesset Quantum Chemistry Calculations with Graphical Processing Units

et al. 2008

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

Accelerating Resolution-of-the-Identity Second-Order Møller−Plesset Quantum Chemistry Calculations with Graphical Processing Units

et al. 2008

View full text Add to dashboard Cite

show abstract

“…Distinct sorting algorithms can be used to obtain the top elements from a set of candidates. Previous work introduced custom sorting algorithms for specific tasks using multi-core CPU (Tridgell, 1999) and GPU setups (Satish et al, 2009;Govindaraju et al, 2006).…”

Section: Gpu Sortingmentioning

confidence: 99%

Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units

Argueta¹,

Chiang²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Graphics Processing Units (GPUs) are commonly used to train and evaluate neural networks efficiently. While previous work in deep learning has focused on accelerating operations on dense matrices/tensors on GPUs, efforts have concentrated on operations involving sparse data structures. Operations using sparse structures are common in natural language models at the input and output layers, because these models operate on sequences over discrete alphabets. We present two new GPU algorithms: one at the input layer, for multiplying a matrix by a few-hot vector (generalizing the more common operation of multiplication by a one-hot vector) and one at the output layer, for a fused softmax and top-N selection (commonly used in beam search). Our methods achieve speedups over state-of-theart parallel GPU baselines of up to 7× and 50×, respectively. We also illustrate how our methods scale on different GPU architectures.

show abstract

“…There have been some newer approaches to sorting networks often in combination with hardware accelerators like FPGAs [17] or GPUs [9]. In particular GPGPU programming has led to a little renaissance of sorting networks, especially with different implementations of the Bitonic Sorter [19,8,10] achieving good results. However these approaches usually either implement the bitonic sorter in the original way as presented by Batcher or sometimes implement the Adaptive Bitonic Sorter [2] instead.…”

Section: Related Workmentioning

confidence: 99%

An Agglomeration Law for Sorting Networks and its Application in Functional Programming

Schiller¹

2017

Electron. Proc. Theor. Comput. Sci.

View full text Add to dashboard Cite

In this paper we will present a general agglomeration law for sorting networks. Agglomeration is a common technique when designing parallel programmes to control the granularity of the computation thereby finding a better fit between the algorithm and the machine on which the algorithm runs. Usually this is done by grouping smaller tasks and computing them en bloc within one parallel process. In the case of sorting networks this could be done by computing bigger parts of the network with one process. The agglomeration law in this paper pursues a different strategy: The input data is grouped and the algorithm is generalised to work on the agglomerated input while the original structure of the algorithm remains. This will result in a new access opportunity to sorting networks wellsuited for efficient parallelization on modern multicore computers, computer networks or GPGPU programming. Additionally this enables us to use sorting networks as (parallel or distributed) merging stages for arbitrary sorting algorithms, thereby creating new hybrid sorting algorithms with ease. The expressiveness of functional programming languages helps us to apply this law to systematically constructed sorting networks, leading to efficient and easily adaptable sorting algorithms. An application example is given, using the Eden programming language to show the effectiveness of the law. The implementation is compared with different parallel sorting algorithms by runtime behaviour.

show abstract

GPUTeraSort

Cited by 291 publications

References 39 publications

Accelerating Resolution-of-the-Identity Second-Order Møller−Plesset Quantum Chemistry Calculations with Graphical Processing Units

Accelerating Resolution-of-the-Identity Second-Order Møller−Plesset Quantum Chemistry Calculations with Graphical Processing Units

Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units

An Agglomeration Law for Sorting Networks and its Application in Functional Programming

Contact Info

Product

Resources

About