An efficient filter structure for multiplierless Sobel edge detection

Pradabpet, Chusit; Ravinu, N.; Chivapreecha, Sorawat; Knobnob, Boonying; Dejhan, Kobchai

doi:10.1109/citisia.2009.5224243

Cited by 13 publications

(2 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data compression and Decompression: zip files, JPEG and PNG image compression [80,59]; 3. Image processing: filters for real-time image processing, such as Sobel filtering, image sharpening, image blurring [82,71,88,91]; 4. Pathfinding: Breadth First Search (BFS) and Dijkstra's Algorithm [118]; 5.…”

Section: Applicationsmentioning

confidence: 99%

SMASH: Sparse Matrix Atomic Scratchpad Hashing

Shivdikar

2021

Preprint

View full text Add to dashboard Cite

Sparse matrices, more specifically Sparse Matrix-Matrix Multiply (SpGEMM) kernels, are commonly found in a wide range of applications, spanning graph-based path-finding to machine learning algorithms (e.g., neural networks). A particular challenge in implementing SpGEMM kernels has been the pressure placed on DRAM memory. One approach to tackle this problem is to use an inner product method for the SpGEMM kernel implementation. While the inner product produces fewer intermediate results, it can end up saturating the memory bandwidth, given the high number of redundant fetches of the input matrix elements. Using an outer product-based SpGEMM kernel can reduce redundant fetches, but at the cost of increased overhead due to extra computation and memory accesses for producing/managing partial products.In this thesis, we introduce a novel SpGEMM kernel implementation based on the rowwise product approach. We leverage atomic instructions to merge intermediate partial products as they are generated. The use of atomic instructions eliminates the need to create partial product matrices, thus eliminating redundant DRAM fetches.To evaluate our row-wise product approach, we map an optimized SpGEMM kernel to a custom accelerator designed to accelerate graph-based applications. The targeted accelerator is an experimental system named PIUMA, being developed by Intel. PIUMA provides several attractive features, including fast context switching, user-configurable caches, globally addressable memory, non-coherent caches, and asynchronous pipelines. We tailor our SpGEMM kernel to exploit many of the features of the PIUMA fabric.This thesis compares our SpGEMM implementation against prior solutions, all mapped to the PIUMA framework. We briefly describe some of the PIUMA architecture features and then delve into the details of our optimized SpGEMM kernel. Our SpGEMM kernel can achieve 9.4× speedup as compared to competing approaches.x xi

show abstract

Section: Applicationsmentioning

confidence: 99%

SMASH: Sparse Matrix Atomic Scratchpad Hashing

Shivdikar

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Thus, the resource consumptions are too high. [116] proposed that they could operate the Sobel operation efficiently. However, their design can only run at a very low frequency, and they still need to use two storage spaces to save the data.…”

Section: Sobel Operator Designmentioning

confidence: 99%

System-on-a-Chip (SoC) based Hardware Acceleration in Register Transfer Level (RTL) Design

Niu¹

View full text Add to dashboard Cite

Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools.Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. viiThe contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods -central bus design and coprocessor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced.Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs.Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption. viii MotivationSince the invention of modern computers from the middle of the twentieth century, semiconductor manufacturers have focused mainly on processing as much information as possible while maintaining or minimizing the execution time. In addition, as technology evolves, the size of the tran...

show abstract