Taxonomy of Vectorization Patterns of Programming for FIR Image Filters Using Kernel Subsampling and New One

Maeda, Yuji; Fukushima, Norishige; Matsuo, Hiroshi

doi:10.3390/app8081235

Cited by 23 publications

(18 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For referring LUTs, the set or gather SIMD instructions were employed. The outermost loop was parallelized by multi-core threading, and we had pixel-loop vectorization [50]. This implementation was found to be the most effective [50].…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Effective Implementation of Edge-Preserving Filtering on CPU Microarchitectures

2018

Self Cite

View full text Add to dashboard Cite

In this paper, we propose acceleration methods for edge-preserving filtering. The filters natively include denormalized numbers, which are defined in IEEE Standard 754. The processing of the denormalized numbers has a higher computational cost than normal numbers; thus, the computational performance of edge-preserving filtering is severely diminished. We propose approaches to prevent the occurrence of the denormalized numbers for acceleration. Moreover, we verify an effective vectorization of the edge-preserving filtering based on changes in microarchitectures of central processing units by carefully treating kernel weights. The experimental results show that the proposed methods are up to five-times faster than the straightforward implementation of bilateral filtering and non-local means filtering, while the filters maintain the high accuracy. In addition, we showed effective vectorization for each central processing unit microarchitecture. The implementation of the bilateral filter is up to 14-times faster than that of OpenCV. The proposed methods and the vectorization are practical for real-time tasks such as image editing.

show abstract

Section: Resultsmentioning

confidence: 99%

“…The outermost loop was parallelized by multi-core threading, and we had pixel-loop vectorization [50]. This implementation was found to be the most effective [50]. Notably, a vectorized exponential operation is not implemented in these CPUs.…”

Section: Resultsmentioning

confidence: 99%

Effective Implementation of Edge-Preserving Filtering on CPU Microarchitectures

2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…In addition, the search window and kernel sizes are closely related not only to image characteristics, but also to the calculation time (i.e., time resolution) [ 46 , 47 ]. Figure 6 shows the time resolution results of various search window and kernel sizes.…”

Section: Discussionmentioning

confidence: 99%

Application of Fast Non-Local Means Algorithm for Noise Reduction Using Separable Color Channels in Light Microscopy Images

Kang

Kim

2021

IJERPH

View full text Add to dashboard Cite

The purpose of this study is to evaluate the various control parameters of a modeled fast non-local means (FNLM) noise reduction algorithm which can separate color channels in light microscopy (LM) images. To achieve this objective, the tendency of image characteristics with changes in parameters, such as smoothing factors and kernel and search window sizes for the FNLM algorithm, was analyzed. To quantitatively assess image characteristics, the coefficient of variation (COV), blind/referenceless image spatial quality evaluator (BRISQUE), and natural image quality evaluator (NIQE) were employed. When high smoothing factors and large search window sizes were applied, excellent COV and unsatisfactory BRISQUE and NIQE results were obtained. In addition, all three evaluation parameters improved as the kernel size increased. However, the kernel and search window sizes of the FNLM algorithm were shown to be dependent on the image processing time (time resolution). In conclusion, this work has demonstrated that the FNLM algorithm can effectively reduce noise in LM images, and parameter optimization is important to achieve the algorithm’s appropriate application.

show abstract

“…In [18], the Harris operator is optimized using a number of optimizations such as vectorization, data interleaving and parallelization, on both x86/x64 and Arm processors. In [19], different ways of vectorizing the 3D convolution are shown. In [20], a white paper for a Gaussian Blur implementation on Intel processors is proposed, using FP computations.…”

Section: Related Workmentioning

confidence: 99%

Design and Implementation of 2D Convolution on x86/x64 Processors

Kelefouras

Κεραμίδας

2022

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

In this paper, a new method for accelerating the 2D direct Convolution operation on x86/x64 processors is presented. It includes efficient vectorization by using SIMD intrinsics, bit-twiddling optimizations, the optimization of the division operation, multi-threading using OpenMP, register blocking and the shortest possible bit-width value of the intermediate results. The proposed method, which is provided as open-source, is general and can be applied to other processor families too, e.g., Arm. The proposed method has been evaluated on two different multi-core Intel CPUs, by using twenty different image sizes, 8-bit integer computations and the most commonly used kernel sizes (3x3, 5x5, 7x7, 9x9). It achieves from 2.8× to 40× speedup over the Intel IPP library (OpenCV GaussianBlur and Filter2D routines), from 105× to 400× speedup over the gemm-based convolution method (by using Intel MKL int8 matrix multiplication routine), and from 8.5× to 618× speedup over the vslsConvExec Intel MKL direct convolution routine. The proposed method is superior as it achieves far fewer arithmetical and load/store instructions.

show abstract

Taxonomy of Vectorization Patterns of Programming for FIR Image Filters Using Kernel Subsampling and New One

Cited by 23 publications

References 40 publications

Effective Implementation of Edge-Preserving Filtering on CPU Microarchitectures

Effective Implementation of Edge-Preserving Filtering on CPU Microarchitectures

Application of Fast Non-Local Means Algorithm for Noise Reduction Using Separable Color Channels in Light Microscopy Images

Design and Implementation of 2D Convolution on x86/x64 Processors

Contact Info

Product

Resources

About