Abstract-This paper illustrates the design and performance evaluation of few algorithms used for analysing the medical image volumes on the massive parallel graphics processing unit (GPU) with compute unified device architecture (CUDA). These algorithms are selected from the general framework, devised for computer aided diagnostic (CAD) system. The CAD system used for analysing large medical image datasets are usually a pipeline processing that includes a variety of image processing operations. A MRI scanner captures the 3D human head into a series of 2D images. Considerable time spent in pre and post processing of these images. Noise filters, segmentation, image diffusion and enhancement are few such methods. The algorithms are chosen for study requires local information, available in few pixels or global information available in the entire image. These problems are best candidates for GPU implementation, since the parallelism is naturally provided by the proposed Per-Pixel Threading (PPT) or Per-Slice Threading (PST) operations. In this paper implement the algorithms for adaptive filtering, anisotropic diffusion, bilateral filtering, non-local means (NLM) filtering, K-Means segmentation and feature extraction in 1536 core's NVIDIA GPU and estimated the speed up gained. Our experiments show that the GPU based implementation achieved typical speedup values in the range of 3-338 times compared to conventional central processing unit (CPU) processor in PPT model and up to 30 times in PST model.