Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing 2018
DOI: 10.1145/3178433.3178436
|View full text |Cite
|
Sign up to set email alerts
|

SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 12 publications
0
1
0
Order By: Relevance
“…For batched Cholesky factorization and Kalman filters, Lemaitre et al [34] propose a template system. Rodrigues et al [44] specify a small DSL for static tensor multiplications-even parallelizing error correction in 5G base stations [14] warrants a DSL. Likewise, there is a DSL for stencil operations, prime examples for memory-bound kernels and the importance of minimizing memory transformations through registers, in CUDA [56].…”
Section: Related Workmentioning
confidence: 99%
“…For batched Cholesky factorization and Kalman filters, Lemaitre et al [34] propose a template system. Rodrigues et al [44] specify a small DSL for static tensor multiplications-even parallelizing error correction in 5G base stations [14] warrants a DSL. Likewise, there is a DSL for stencil operations, prime examples for memory-bound kernels and the importance of minimizing memory transformations through registers, in CUDA [56].…”
Section: Related Workmentioning
confidence: 99%