Ft-Blas

Zhai, Yujia; Giem, Elisabeth; Fan, Quan; Zhao, Kai; Liu, Jinyang

doi:10.1145/3447818.3460364

Search citation statements

Order By: Relevance

Paper Sections

Select...

Gemm Variant1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2022

2024

Publication Types

Select...

Other4

Article2

Relationship

Self Cite0

Independent6

Authors

Journals

Cited by 8 publications

(1 citation statement)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…GEMM employs a series of architecture-aware optimization strategies, such as cacheand register-level data re-use, prefetching, and vectorization that improve the hardware utilization of a program from a marginal < 1% to a near-optimal efficacy (> 90%) [83,230,207]. To leverage the highly optimized GEMM subroutine, the order of data in memory for spin configurations S α r (t) strategy that fuses the memory footprint of the element-wise operation with the compute-bound GEMM operation to hide the memory latency, is a sound solution that benefits a series of GEMMbased scientific computing and machine-learning applications [263,264]. Therefore, we delve into the black box of GEMM kernels, enabling memory-bandwidth efficient computations for "Daxpy"…”

Section: Gemm Variantmentioning

confidence: 99%