The image will be contaminated by noise during the imaging process, which severely degrades the image quality. It is necessary to filter the collected image. With the increasing amount of image data, the traditional single-processor or multiprocessor computing equipment has been unable to meet the requirements of real-time data processing. In this paper, the computational model of weighted mean filtering and the characteristics of high performance computer architecture are studied. An efficient hierarchical image weighted mean filtering parallel algorithm for Open Computing Language (OpenCL) is designed and implemented, which can fully express the parallelism of the computing model. The parallel algorithm takes full account of the characteristics of image discrete convolution computing and the multilayer logic architecture of high performance computer, deeply excavates the parallelism of the computing platform and computing model, and realizes the efficient task mapping from computing model to computing resources. The model is implemented in parallel with the two levels of work-group and workitem. The experimental results show that compared with the serial algorithm based on CPU, the parallel algorithm based on Open Multi-Processing (OpenMP) and the parallel algorithm based on Compute Unified Device Architecture (CUDA), the parallel algorithm of weighted mean filtering achieves 20.88 times, 18.52 times and 1.26 times acceleration ratio on the NVIDIA GPU computing platform based on OpenCL architecture, respectively. It realizes better computing performance and runs on different Graphic Processing Unit (GPU) computing platforms, and has good portability and scalability.INDEX TERMS weighted mean filtering; Gaussian noise; Graphic Processing Unit (GPU); Open Computing Language (OpenCL); parallel algorithm.