Abstract-The parallel programming come a long way with the advances in the HPC. The high performance computing landscape is shifting from collections of homogeneous nodes towards heterogeneous systems, in which nodes consist of a combination of traditional out-of-order execution cores and accelerator devices. Accelerators, built around GPUs, many-core chips, FPGAs or DSPs, are used to offload computeintensive tasks. Large-scale GPU clusters are gaining popularity in the scientific computing community and having massive range of applications. However, their deployment and production use are associated with a number of new challenges including CUDA. In this paper, we present our efforts to address some of the issues related to HPC and also introduced some performance modelling techniques along with GPU clustering.