Exploiting GPUs with the Super Instruction Architecture

Jindal, Nakul; Lotrich, Victor F.; Deumens, Erik; Sanders, Beverly A.

doi:10.1007/s10766-014-0319-4

Cited by 7 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has been shown that by properly selecting intermediate arrays and optimizing the and loops, the efficiency of a CCSD code can be increased by a factor of 5 . To a larger extent, high numerical costs associated with the polynomial scaling can be effectively addressed by the development of highly scalable implementations of CC methods, as evidenced by several recent benchmark calculations. − Growing interest in efficient utilization of peta- and soon-to-be exa-scale computational resources has stimulated an intensive development of various tensor libraries − that can be exploited in generating scalable CC codes for homogeneous as well as for many-core/multicore computer systems. − Nevertheless, in all above-mentioned examples of canonical CC implementations the storage requirement will quickly grow as a function of the system size to become a storage and communication bottleneck when going from mid- (10 2 to 10 3 basis functions) to large-scale (10 3 to 10 4 basis functions) CC calculations. Although it has been shown that by employing integral-direct algorithms the storage requirement can be greatly minimized, the integral-direct way might also bring frequent I/O operations and/or the necessity of recalculating “on-the-fly” atomic two-electron integrals, which would then increase the CPU time and deteriorate the scaling with system size.…”

Section: Introductionmentioning

confidence: 99%

Highly Efficient and Scalable Compound Decomposition of Two-Electron Integral Tensor and Its Application in Coupled Cluster Calculations

Peng

Kowalski

2017

J. Chem. Theory Comput.

View full text Add to dashboard Cite

The representation and storage of two-electron integral tensors are vital in large-scale applications of accurate electronic structure methods. Low-rank representation and efficient storage strategy of integral tensors can significantly reduce the numerical overhead and consequently time-to-solution of these methods. In this work, by combining pivoted incomplete Cholesky decomposition (CD) with a follow-up truncated singular vector decomposition (SVD), we develop a decomposition strategy to approximately represent the two-electron integral tensor in terms of low-rank vectors. A systematic benchmark test on a series of 1-D, 2-D, and 3-D carbon-hydrogen systems demonstrates high efficiency and scalability of the compound two-step decomposition of the two-electron integral tensor in our implementation. For the size of the atomic basis set, N, ranging from ∼100 up to ∼2,000, the observed numerical scaling of our implementation shows [Formula: see text] versus [Formula: see text] cost of performing single CD on the two-electron integral tensor in most of the other implementations. More importantly, this decomposition strategy can significantly reduce the storage requirement of the atomic orbital (AO) two-electron integral tensor from [Formula: see text] to [Formula: see text] with moderate decomposition thresholds. The accuracy tests have been performed using ground- and excited-state formulations of coupled cluster formalism employing single and double excitations (CCSD) on several benchmark systems including the C molecule described by nearly 1,400 basis functions. The results show that the decomposition thresholds can be generally set to 10 to 10 to give acceptable compromise between efficiency and accuracy.

show abstract

Section: Introductionmentioning

confidence: 99%

Highly Efficient and Scalable Compound Decomposition of Two-Electron Integral Tensor and Its Application in Coupled Cluster Calculations

Peng

Kowalski

2017

J. Chem. Theory Comput.

View full text Add to dashboard Cite

show abstract

“…4,17,61 Currently, we rely on underlying libraries such as Eigen 62 (with interfaces to BLAS implementations) or Libint2 22 (for molecular integral calculations) for achieving parallelization in particular subcalculations. By relying on lower-level adapter-like modules, 61,[63][64][65][66][67][68][69][70] our APIs can, in principle, proceed to scale to the exascale regime. As GQCP's focus is to provide useful generalizations, GQCP could serve as an initiative to further improve inter-module communication between the modules in the current electronic structure software ecosystem.…”

Section: Software Development In Gqcpmentioning

confidence: 99%

GQCP: The Ghent Quantum Chemistry Package

Lemmens

Vriendt

Hende

et al. 2021

The Journal of Chemical Physics

View full text Add to dashboard Cite

The Ghent Quantum Chemistry Package (GQCP) is an open-source electronic structure software package that aims to provide an intuitive and expressive software framework for electronic structure software development. Its high-level interfaces (accessible through C++ and Python) have been specifically designed to correspond to theoretical concepts, while retaining access to lower-level intermediates and allowing structural run-time modifications of quantum chemical solvers. GQCP focuses on providing quantum chemical method developers with the computational "building blocks" that allow them to flexibly develop proof of principle implementations for new methods and applications up to the level of two-component spinor bases.

show abstract

“…But since it does not have to support all possible computational workloads across domains, its implementation complexity is also reduced. Here we should mention again that examples of such DS parallel runtimes have existed before, however their architectural design either did not use the concept of DSVP or it introduced it in an ad hoc fashion without derivation from the abstract (base) DSVP architecture supplied with a clear specification.…”

Section: Abstract Dsvpmentioning

confidence: 99%

“…Although the TAVP microarchitecture has its own unique design introducing a number of novel elements such as the fully hierarchical hardware encapsulation, it can also be viewed as a generalization and evolution of earlier efforts, specifically the so‐called Super Instruction Architecture framework used in the ACES‐III and ACES‐IV software suites for expressing and executing quantum many‐body algorithms operating on large dense arrays of numbers. In this retrospective, DSVP is a variant of a Super Instruction Processor (SIP) on an abstract architectural level, but it differs from the previous concrete SIP implementations at the microarchitectural level, that is, its exposed implementation design is different. In fact, the previous SIP works did not seem to expose much of the SIP microarchitectural design, that is, the concrete SIP implementations were not derived as a specialization of a well‐defined microarchitectural design.…”

Section: Introductionmentioning

confidence: 99%

Domain‐specific virtual processors as a portable programming and execution model for parallel computational workloads on modern heterogeneous high‐performance computing architectures

Lyakh

2019

Int J of Quantum Chemistry

View full text Add to dashboard Cite

We advocate domain‐specific virtual processors (DSVP) as a portability layer for expressing and executing domain‐specific computational workloads on modern heterogeneous HPC architectures, with applications in quantum chemistry. Specifically, in this article we extend, generalize and better formalize the concept of a domain‐specific virtual processor as applied to scientific high‐performance computing. In particular, we introduce a system‐wide recursive (hierarchical) hardware encapsulation mechanism into the DSVP architecture and specify a concrete microarchitectural design of an abstract DSVP from which specialized DSVP implementations can be derived for specific scientific domains. Subsequently, we demonstrate, an example of a domain‐specific virtual processor specialized to numerical tensor algebra workloads, which is implemented in the ExaTENSOR library developed by the author with a primary focus on the quantum many‐body computational workloads on large‐scale GPU‐accelerated HPC platforms.

show abstract

Exploiting GPUs with the Super Instruction Architecture

Cited by 7 publications

References 12 publications

Highly Efficient and Scalable Compound Decomposition of Two-Electron Integral Tensor and Its Application in Coupled Cluster Calculations

Highly Efficient and Scalable Compound Decomposition of Two-Electron Integral Tensor and Its Application in Coupled Cluster Calculations

GQCP: The Ghent Quantum Chemistry Package

Domain‐specific virtual processors as a portable programming and execution model for parallel computational workloads on modern heterogeneous high‐performance computing architectures

Contact Info

Product

Resources

About