2022
DOI: 10.1177/10943420221077107
|View full text |Cite
|
Sign up to set email alerts
|

Performance portability in a real world application: PHAST applied to Caffe

Abstract: This work covers the PHAST Library’s employment, a hardware-agnostic programming library, to a real-world application like the Caffe framework. The original implementation of Caffe consists of two different versions of the source code: one to run on CPU platforms and another one to run on the GPU side. With PHAST, we aim to develop a single-source code implementation capable of running efficiently on CPU and GPU. In this paper, we start by carrying out a basic Caffe implementation performance analysis using PH… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 28 publications
(47 reference statements)
0
1
0
Order By: Relevance
“…Where possible, nowadays the trend is to use GPUs, FPGAs, NPUs, and other ad-hoc accelerators for seeking higher performance/efficiency than CPUs [15]. GPUs' massively parallel hardware has been successfully employed in the im2col+gemm convolution implementation, but also in direct convolution [16], [17] recently. State-of-the-art convolutional accelerators (e.g., [18], [19]) use specific dataflow structures that can be seen as portions of direct convolution algorithm mapped in hardware, since tensors are processed spatially without applying any transformations.…”
Section: Introductionmentioning
confidence: 99%
“…Where possible, nowadays the trend is to use GPUs, FPGAs, NPUs, and other ad-hoc accelerators for seeking higher performance/efficiency than CPUs [15]. GPUs' massively parallel hardware has been successfully employed in the im2col+gemm convolution implementation, but also in direct convolution [16], [17] recently. State-of-the-art convolutional accelerators (e.g., [18], [19]) use specific dataflow structures that can be seen as portions of direct convolution algorithm mapped in hardware, since tensors are processed spatially without applying any transformations.…”
Section: Introductionmentioning
confidence: 99%