2016
DOI: 10.1016/j.sysarc.2016.04.013
|View full text |Cite
|
Sign up to set email alerts
|

Customization methodology for implementation of streaming aggregation in embedded systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 6 publications
0
4
0
Order By: Relevance
“…Efficient local memory utilization will reduce the number of DMA transfers which will benefit the energy consumption. Additionally, extensive experimentation with Myriad2 has shown that heavy DMA usage reduces the performance of the DMA engine, with negative impact on application's execution time [25]. Furthermore, the management of the available processing units, such as the efficient partitioning of the algorithm between them and the memory alignment issues should be carefully examined to effectively exploit the provided SIMD features.…”
Section: Edge Devices and Cnn Implementation Challengesmentioning
confidence: 99%
“…Efficient local memory utilization will reduce the number of DMA transfers which will benefit the energy consumption. Additionally, extensive experimentation with Myriad2 has shown that heavy DMA usage reduces the performance of the DMA engine, with negative impact on application's execution time [25]. Furthermore, the management of the available processing units, such as the efficient partitioning of the algorithm between them and the memory alignment issues should be carefully examined to effectively exploit the provided SIMD features.…”
Section: Edge Devices and Cnn Implementation Challengesmentioning
confidence: 99%
“…Furthermore, in ported memories, such as in Myr-iad2, stalls may appear under heavy data sharing. Finally, extensive experimentation in Myriad2 has shown that the DMA engine performance is reduced under heavy utilization [12]. Software techniques that optimize data management, improve the utilization of local memories and exploit parallelism can improve both performance and energy efficiency.…”
Section: Edge Devices and Cnn Implementation Challengesmentioning
confidence: 99%
“…However, this scheme imposes frequent DMA data transfers between the local and the global memory during the execution of a CNN inference, due to the limited local memory space available for data. Reducing the frequency of DMA transfers can benefit performance significantly: The effects of DMA data transfers to the performance and the energy consumption in Myriad have been extensively examined in [12]. Allocating the vector processing units' instruction code in the global memory and fetching instructions through the cache subsystem increases the local memory space available for CNN data.…”
Section: Data Transfer and Management Optimizationsmentioning
confidence: 99%
“…Inefficient software can drive low-power hardware to waste the system's energy budget, regardless of the application performance [8]. Relevant works in the literature focus on sourceto-source optimizations at application level for increasing application performance and energy efficiency by improving the data flow and memory utilization [10,15].…”
Section: Introductionmentioning
confidence: 99%