2023
DOI: 10.37936/ecti-cit.2023171.248320
|View full text |Cite
|
Sign up to set email alerts
|

Power Efficient Strassen’s Algorithm using AVX512 and OpenMP in a Multi-core Architecture

Abstract: This paper presents an effective implementation of Strassen's algorithm for matrix-matrix multiplication on shared memory multi-core architecture. The proposed algorithm aims to augment the computation speed in terms of GFLOPS performance on average 4.5 and 4.1 times faster than Eigen and OpenBLAS, respectively while reducing the power consumption to as low as possible. Our algorithm relies on using AVX512 intrinsics, loop unrolling factor, and OpenMP directives. A new 2D blocking data allocation pattern is pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…Oo and Chaikan [21] optimized the Strassen-Winograd algorithm for arbitrary matrix sizes on a GPU using techniques such as empirical modeling, multi-kernel streaming, dynamic peeling, and two temporary matrices. They surpassed previous GPU implementations and showed Strassen's algorithm's practicality.…”
Section: A Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…Oo and Chaikan [21] optimized the Strassen-Winograd algorithm for arbitrary matrix sizes on a GPU using techniques such as empirical modeling, multi-kernel streaming, dynamic peeling, and two temporary matrices. They surpassed previous GPU implementations and showed Strassen's algorithm's practicality.…”
Section: A Literature Reviewmentioning
confidence: 99%
“…Oo and Chaikan [21] 2023 Enhanced Strassen's algorithm for power efficiency using AVX512 and OpenMP on multi-core architecture.…”
Section: Haidar Et Al [17] 2019mentioning
confidence: 99%