Optimization

Fatica, Massimiliano; Ruetsch, Gregory

doi:10.1016/b978-0-12-416970-8.00003-1

Cited by 2 publications

(3 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our implementation with CUDA Fortran [45] offers unprecedented speed and efficiency already visible on commodity hardware (e.g., GeForce 1080). Furthermore, it can be easily tuned for professional GPUs such as Titan V [37] virtually at no extra effort.…”

Section: Discussionmentioning

confidence: 99%

“…The core components of our implementation has been written in modern Fortran 95/2003 [41], which we have chosen for its flexibility [42], extensive support for linear algebra [43], performance [44] and native support for CUDA technology [45]. To make our code easier to use, we have wrapped it in a Python package using the f2py [46] utility and numpy's fork of distutils package [47].…”

Section: Languages and Technologies Employedmentioning

confidence: 99%

See 1 more Smart Citation

Brute-forcing spin-glass problems with CUDA

Jałowiecki¹,

Rams²,

Gardas³

2019

Preprint

View full text Add to dashboard Cite

We demonstrate how to compute the low energy spectrum for small (N ≤ 50), but otherwise arbitrary, spin-glass instances using modern Graphics Processing Units or similar heterogeneous architecture. Our algorithm performs an exhaustive (i.e., brute-force) search of all possible configurations to select S 2 N lowest ones together with their corresponding energies. We mainly focus on the Ising model defined on an arbitrary graph. An open-source implementation based on CUDA Fortran and a suitable Python wrapper are provided. As opposed to heuristic approaches, ours is exact and thus can serve as a references point to benchmark other algorithms and hardware, including quantum and digital annealers. Our implementation offers unprecedented speed and efficiency already visible on commodity hardware. At the same time, it can be easily launched on professional, high-end graphics cards virtually at no extra effort. As a practical application, we employ it to demonstrate that the recent Matrix Product State based algorithm-despite its one-dimensional nature-can still accurately approximate the low energy spectrum of fully connected graphs of size N approaching 50.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Languages and Technologies Employedmentioning

confidence: 99%

Brute-forcing spin-glass problems with CUDA

Jałowiecki¹,

Rams²,

Gardas³

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Since the sustainable bandwidth from the host to the device (and vice versa) plays a key role in the acceleration of a single DGEMM or DTRSM calls, in the CUDA‐Aware HPL benchmark, the CUDA tool related to a fast transfer mode is exploited. Such a tool is enabled when page‐locked memory (sometimes called pinned memory 29 ) is used.…”

Section: Related Work and Motivationmentioning

confidence: 99%

Toward a new linpack‐like benchmark for heterogeneous computing resources

Carracciuolo,

Mele,

Sabella

2023

Concurrency and Computation

View full text Add to dashboard Cite

SummaryThis work describes some first efforts to design a new Linpack‐like benchmark useful to evaluate the performance of Heterogeneous Computing Resources. The benchmark is based on the Schur Complement reformulation of the solution of a linear equation system. Details about its implementation and evaluation, mainly in terms of performance scalability, are presented for a computing environment based on multi NVIDIA GP‐GPUs nodes connected by an Infiniband network.

show abstract

Optimization

Cited by 2 publications

References 41 publications

Brute-forcing spin-glass problems with CUDA

Brute-forcing spin-glass problems with CUDA

Toward a new linpack‐like benchmark for heterogeneous computing resources

Contact Info

Product

Resources

About