2016
DOI: 10.1021/acs.jctc.6b00884
|View full text |Cite
|
Sign up to set email alerts
|

Clustered Low-Rank Tensor Format: Introduction and Application to Fast Construction of Hartree–Fock Exchange

Abstract: Clustered Low Rank (CLR) framework for block-sparse and block-low-rank tensor representation and computation is described. The CLR framework depends on 2 parameters that control precision: one controlling the CLR block rank truncation and another that controls screening of small contributions in arithmetic operations on CLR tensors. As these parameters approach zero CLR representation and arithmetic become exact. There are no other ad-hoc heuristics, such as domains. Use of the CLR format for the order-2 and o… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
21
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2
1

Relationship

4
4

Authors

Journals

citations
Cited by 25 publications
(21 citation statements)
references
References 83 publications
(184 reference statements)
0
21
0
Order By: Relevance
“…Other lower-level programming styles, such as functional-style iteration over tensor blocks to explicit loops over tile indices and direct byte-level access to the data, are also supported to provide experts with the ability to compose arbitrary algorithms over general sparse tensorial data structures. 213 TA has been designed to support efficient execution on modern and future hardware of all scales, from a single multi-core machine to a cluster of multi-core, multi-GPU nodes, to leadership-class supercomputers. To maximize the concurrency and hide latency, which is crucial for alleviating the load imbalance and lower computation-to-communication ratio of the irregular sparse tensor algebra, TA has an asynchronous, dataflow-style core.…”
Section: Tiledarraymentioning
confidence: 99%
“…Other lower-level programming styles, such as functional-style iteration over tensor blocks to explicit loops over tile indices and direct byte-level access to the data, are also supported to provide experts with the ability to compose arbitrary algorithms over general sparse tensorial data structures. 213 TA has been designed to support efficient execution on modern and future hardware of all scales, from a single multi-core machine to a cluster of multi-core, multi-GPU nodes, to leadership-class supercomputers. To maximize the concurrency and hide latency, which is crucial for alleviating the load imbalance and lower computation-to-communication ratio of the irregular sparse tensor algebra, TA has an asynchronous, dataflow-style core.…”
Section: Tiledarraymentioning
confidence: 99%
“…The ABCD term was evaluated using the AO-based formalism [26]. The input tensor T representing its initial state in the coupled-cluster simulation was evaluated in AO basis using the Laplace transform approximation, with the occupied orbitals localized and both occupied and AO basis clustered to group spatially-close orbitals together [29]; the clustering defines tiling of the corresponding index ranges. The CPU-only implementation in MPQC evaluates tensor V on the fly, as needed; due to the lack of publicly-available efficient kernels for direct evaluation of AO integrals on GPUs (such kernels are under development by some of us) the GPU benchmarks used blocksparse V with the actual sparsity pattern determined by the CPU-only code but the tiles filled with random data.…”
Section: Practical Example: Evaluation Of the Abcd Coupledcluster Tensor Contraction For Molecule C 65 H 132mentioning
confidence: 99%
“…To evaluate the impact of the tiling on performance, we consider three representative tilings of the index ranges. Since the k-means-based clustering algorithm that determines the range tilings is quasirandom [29] and cannot ensure uniform tiling (this would necessarily violate locality in all practical applications), these tilings are generated by specifying the target number of clusters for each index range. Table 1 summarizes the difference between the three different tilings, from the most fine-grained one (v 1 ) to the most coarse-grained one (v 3 ).…”
Section: Practical Example: Evaluation Of the Abcd Coupledcluster Tensor Contraction For Molecule C 65 H 132mentioning
confidence: 99%
“…To recover data sparsity in tensors appearing in quantum physics applications we developed the Clustered Low-Rank (CLR) representation [23] that is a general, hierarchy-free compressed tensor format. In this representation, each MIJ block of matrix M is approximated by a low-rank decomposition of the form MIJ ≈ XW † , where for a given MIJ ∈ R m×n of rank r, X is m × r and W is n × r. X and W were constructed from a rank-revealing QR decomposition, [26] MIJ P = QR.…”
Section: Clustered Low-rank Representationmentioning
confidence: 99%