2015
DOI: 10.1007/s11554-015-0544-0
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 37 publications
0
6
0
Order By: Relevance
“…There are no publicly-available benchmarks for OpenVX. In the literature, previous work on optimizing OpenVX graphs uses a set of relatively small graphs to evaluate the proposed techniques [14] [15], making it difficult to generalize results. In order to evaluate our approach over a large set of OpenVX graphs, we developed a tool that to randomly generate synthetic graphs having a given number of kernels.…”
Section: Automated Kernel Fusing and Tile Size Selectionmentioning
confidence: 99%
“…There are no publicly-available benchmarks for OpenVX. In the literature, previous work on optimizing OpenVX graphs uses a set of relatively small graphs to evaluate the proposed techniques [14] [15], making it difficult to generalize results. In order to evaluate our approach over a large set of OpenVX graphs, we developed a tool that to randomly generate synthetic graphs having a given number of kernels.…”
Section: Automated Kernel Fusing and Tile Size Selectionmentioning
confidence: 99%
“…In the OpenCL abstract model, each instance of the execution kernel is called a work-item, which is represented by its coordinates in the NDRange. The corresponding hardware is the processing element [34]. Multiple workitems are organized as a work-group, providing a coarser division of NDRange, where work-items in a given workgroup are executed concurrently on the processing element of a compute unit.…”
Section: A Opencl Parallel Computing Platformmentioning
confidence: 99%
“…It has been implemented by a few major vendors, including Nvidia, Intel, AMD, and Synopsys [28]. The authors of [5,9,25,31,32] focus on graph scheduling and design space exploration for heterogeneous systems consisting of GPUs, CPUs, and custom instruction-set architectures. Unlike the prior work, [24] suggests static OpenVX compilation for low-power embedded systems instead of runtime-library implementations.…”
Section: Related Workmentioning
confidence: 99%
“…3, where redundant computations are eliminated, and nodes are aggregated for better exploitation of locality. Memory access patterns of our abstractions entail system-level optimization strategies motivated by the OpenVX standard, such as image tiling [25] and hardware-software partitioning [26]. An abstractionbased implementation allows expressing aggregated computations as part of the reconstructed graph.…”
Section: Computational Abstractionsmentioning
confidence: 99%