2016
DOI: 10.1007/s10586-016-0611-8
|View full text |Cite
|
Sign up to set email alerts
|

Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors

Abstract: Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications.In this paper, we design and embed several architecture-aware optimizations into … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
24
0
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
7

Relationship

6
1

Authors

Journals

citations
Cited by 17 publications
(25 citation statements)
references
References 27 publications
0
24
0
1
Order By: Relevance
“…• In conclusion, compared with previous work [13,15], this paper demonstrates that, for the particular domain of DLA, it is possible to hide the difficulties intrinsic to dealing with an asymmetric architecture (e.g., workload balancing for performance, energy-aware mapping of tasks to cores, and criticality-aware scheduling) inside an asymmetry-aware implementation of the BLAS-3. As a consequence, our solution can refactor any conventional (asymmetry-agnostic) scheduler to exploit the task parallelism present in complex DLA operations.…”
Section: Introductionmentioning
confidence: 67%
See 3 more Smart Citations
“…• In conclusion, compared with previous work [13,15], this paper demonstrates that, for the particular domain of DLA, it is possible to hide the difficulties intrinsic to dealing with an asymmetric architecture (e.g., workload balancing for performance, energy-aware mapping of tasks to cores, and criticality-aware scheduling) inside an asymmetry-aware implementation of the BLAS-3. As a consequence, our solution can refactor any conventional (asymmetry-agnostic) scheduler to exploit the task parallelism present in complex DLA operations.…”
Section: Introductionmentioning
confidence: 67%
“…These studies offered a few relevant insights that guided the parallelization of gemm (and also other Level-3 BLAS) on the ARM big.LITTLE architecture under the GTS software execution model. Concretely, the architecture-aware multi-threaded parallelization of gemm in [13] integrates the following three techniques:…”
Section: Data-parallel Libraries For Asymmetric Architecturesmentioning
confidence: 99%
See 2 more Smart Citations
“…The approach parallelizes the nested five-loop organization of gemm at one or more levels (i.e., loops), taking into account the cache organization of the target platform, the granularity of the computations, and the risk of race conditions, among other factors. For the multicore processors targeted in this work, an efficient choice is to extract the parallelism from Loop 4 only [26] via, e.g., OpenMP.…”
Section: Matrix Multiplicationmentioning
confidence: 99%