Memory-Access-Driven Context Partitioning for Window-Based Image Processing on Heterogeneous Multicore Processors

Waidyasooriya, Hasitha Muthumala; Ohbayashi, Y.; Hariyama, Masanori; Kameyama, Michitaka

doi:10.1587/transinf.e95.d.354

Cited by 1 publication

(5 citation statements)

References 13 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(15), we measured the values α, β and t ctrl in Eqs. (6), (10) and (12) respectively. We measured data-transfer times between the external memory and memory modules of accelerator cores.…”

Section: Discussionmentioning

confidence: 99%

“…It is very easy to implement and easily scalable by changing the number of PEs and memories. Moreover, it has been studied extensively for memory allocation [10], data transfers [11], context partitioning [12], etc and many efficient techniques are proposed. It is already been used to implement various applications in many prior works, such as audio encoding [1], feature extraction [13], optical-flow extraction [14], etc.…”

Section: Heterogeneous Multicore Architecture Modelmentioning

confidence: 99%

“…However, considering the benefits of power-efficiency and high performance, it is worth spending resources on AGUs. In this work, the address function proposed in previous works [7], [10] is used. This address function is simple, and the resource usage of AGUs is small.…”

Section: Heterogeneous Multicore Architecture Modelmentioning

confidence: 99%

“…11 is defined by Eq. (12) t trans = t AC + t CA2 + t ctrl (12) where, t AC is the data-transfer time from accelerator cores to CPU core, t CA2 is the data-transfer time from CPU cores to accelerator cores and t ctrl is the control overhead due to starting and stopping the accelerator cores. To estimate t mid , we have to consider the overlap between the data-transfers and the computations.…”

Section: Estimation Of the Total Processing Timementioning

confidence: 99%

“…We propose a parameterized architecture model and introduce an evaluation methodology to find the optimal architecture for the design parameters. In the optimization problem, we focus on window-based processing which has many applications such as stereo matching [6], feature detection [7], scaleinvariant feature transformation (SIFT) [8], histogram of oriented gradients (HOG) [9], matrix processing, filtering, etc. The evaluation using filter computation as an example demonstrates that the processing time estimated by the proposed design methodology has sufficient accuracy compared to the actual measurement of the FPGA architecture.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Data-Transfer-Aware Design of an FPGA-Based Heterogeneous Multicore Platform with Custom Accelerators

Takei

Waidyasooriya

Hariyama

et al. 2015

IEICE Trans. Fundamentals

Self Cite

View full text Add to dashboard Cite

For an FPGA-based heterogeneous multicore platform, we present the design methodology to reduce the total processing time considering data-transfer. The reconfigurability of recent FPGAs with hard CPU cores allows us to realize a single-chip heterogeneous processor optimized for a given application. The major problem in designing such heterogeneous processors is data-transfer between CPU cores and accelerator cores. The total processing time with data-transfers is modeled considering the overlap of computation time and data-transfer time, and optimal design parameters are searched for.

show abstract

Section: Discussionmentioning

confidence: 99%