Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths

Diken, Erkan; O’Riordan, Martin J.; Jordans, Roel; Jóźwiak, L.; Corporaal, Henk; Moloney, David

doi:10.1109/asap.2015.7245732

Cited by 2 publications

(3 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SHAVE is a VLIW processor containing a set of functional units which are fed with operands from three different register files [21]. The processor contains optimized functional units such as a branch and repeat unit (BRU), a compare and move unit (CMU), arithmetic units, and Fig.…”

Section: Myriad 2 Architecturementioning

confidence: 99%

“…One such effort is Myriad 2 platform from Movidius [20]. It is a low-power multi-processor system on chip (MPSoC) that uses an array of very long instruction word (VLIW) processors with vector and single instruction multiple data (SIMD) execution capabilities [21]. Each processor supports two load and store units (LSUs) to overlap latency of memory operations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploiting architectural features of a computer vision platform towards reducing memory stalls

Mustafa

O’Riordan

Rogers

et al. 2018

J Real-Time Image Proc

View full text Add to dashboard Cite

Computer vision applications are becoming more and more popular in embedded systems such as drones, robots, tablets, and mobile devices. These applications are both compute and memory intensive, with memory bound stalls (MBS) making a significant part of their execution time. For maximum reduction in memory stalls, compilers need to consider architectural details of a platform and utilize its hardware components efficiently. In this paper, we propose a compiler optimization for a vision-processing system through classification of memory references to reduce MBS. As the proposed optimization is based on the architectural features of a specific platform, i.e., Myriad 2, it can only be applied to other platforms having similar architectural features. The optimization consists of two steps: affinity analysis and affinity-aware instruction scheduling. We suggest two different approaches for affinity analysis, i.e., source code annotation and automated analysis. We use LLVM compiler infrastructure for implementation of the proposed optimization. Application of annotation-based approach on a memory-intensive program shows a reduction in stall cycles by 67.44%, leading to 25.61% improvement in execution time. We use 11 different image-processing benchmarks for evaluation of automated analysis approach. Experimental results show that classification of memory references reduces stall cycles, on average, by 69.83%. As all benchmarks are both compute and memory intensive, we achieve improvement in execution time by up to 30%, with a modest average of 5.79%.

show abstract

Section: Myriad 2 Architecturementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Exploiting architectural features of a computer vision platform towards reducing memory stalls

Mustafa

O’Riordan

Rogers

et al. 2018

J Real-Time Image Proc

View full text Add to dashboard Cite

show abstract

“…For video applications, the frame rate can be increased by customizing the architecture with parallel execution of the input data set [14,16]. To perceive motion in the video, refreshing of the frames should take place very quickly.…”

Section: Image Quality Enhancementmentioning

confidence: 99%

Reconfigurable Fpga Based Soft-Core Processor for Simd Applications

Maheswari

Pattabiraman

Sharmila

2017

Asian J Pharm Clin Res

View full text Add to dashboard Cite

Objective: The prospective need of SIMD (Single Instruction and Multiple Data) applications like video and image processing in single system requires greater flexibility in computation to deliver high quality real time data. This paper performs an analysis of FPGA (Field Programmable Gate Array) based high performance Reconfigurable OpenRISC1200 (ROR) soft-core processor for SIMD.Methods: The ROR1200 ensures performance improvement by data level parallelism executing SIMD instruction simultaneously in HPRC (High Performance Reconfigurable Computing) at reduced resource utilization through RRF (Reconfigurable Register File) with multiple core functionalities. This work aims at analyzing the functionality of the reconfigurable architecture, by illustrating the implementation of two different image processing operations such as image convolution and image quality improvement. The MAC (Multiply-Accumulate) unit of ROR1200 used to perform image convolution and execution unit with HPRC is used for image quality improvement.Result: With parallel execution in multi-core, the proposed processor improves image quality by doubling the frame rate up-to 60 fps (frames per second) with peak power consumption of 400mWatt. Thus the processor gives a significant computational cost of 12ms with a refresh rate of 60Hz and 1.29ns of MAC critical path delay.Conclusion:This FPGA based processor becomes a feasible solution for portable embedded SIMD based applications which need high performance at reduced power consumptions

show abstract

Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths

Cited by 2 publications

References 19 publications

Exploiting architectural features of a computer vision platform towards reducing memory stalls

Exploiting architectural features of a computer vision platform towards reducing memory stalls

Reconfigurable Fpga Based Soft-Core Processor for Simd Applications

Contact Info

Product

Resources

About