Traditional methods for processing large images are extremely time intensive. Also, conventional image processing methods do not take advantage of available computing resources such as multicore central processing unit (CPU) and manycore general purpose graphics processing unit (GP-GPU). Studies suggest that applying parallel programming techniques to various image filters should improve the overall performance without compromising the existing resources. Recent studies also suggest that parallel implementation of image processing on compute unified device architecture (CUDA)-accelerated CPU/GPU system has potential to process the image very fast. In this paper, we introduce a CUDA-accelerated image processing method suitable for multicore/manycore systems. Using a bitmap file, we implement image processing and filtering through traditional sequential C and newly introduced parallel CUDA/C programs. A key step of the proposed algorithm is to load the pixel's bytes in a one dimensional array with length equal to matrix width * matrix height * bytes per pixel. This is done to process the image concurrently in parallel. According to experimental results, the proposed CUDA-accelerated parallel image processing algorithm provides benefit with a speedup factor up to 365 for an image with 8,192x8,192 pixels.