Median filtering is a well-known method used in a wide range of application frameworks as well as a standalone filter, especially for salt-and-pepper denoising. It is able to highly reduce the power of noise while minimizing edge blurring. Currently, existing algorithms and implementations are quite efficient but may be improved as far as processing speed is concerned, which has led us to further investigate the specificities of modern GPUs. In this paper, we propose the GPU implementation of fixed-size kernel median filters, able to output up to 1.85 billion pixels per second on C2070 Tesla cards. Based on a Branchless Vectorized Median class algorithm and implemented through memory fine tuning and the use of GPU registers, our median drastically outperforms existing implementations, resulting, as far as we know, in the fastest median filter to date.
This paper presents some optimizations based on communications/computations overlap for the ScaLAPACK LU factorization. First a theoretical computation of the optimal block size is given for the block scattered decomposition of the matrix. Two optimizations of this routine are presented that use asynchronous communications to hide the communication overhead and to obtain optimal speed-ups. 1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.