In this paper, four beamforming algorithms (i.e., interpolation and phase rotation with pre-and post-filtering, IBF-PRE, IBF-POST, PRBF-PRE and PRBF-POST, respectively) implemented on a high-performance graphicsprocessing unit (GPU) were presented. Each beamforming method was divided into two kernels consisting of various beamforming and mid-processing blocks and efficiently implemented on a NVIDIA's Computer Unified Device Architecture (CUDA) platform (GeForce GTX560 Ti, NVIDIA, Santa Clara, CA, USA). To evaluate the performance of each method, pre-beamformed radio-frequency (RF) data were captured by a commercial ultrasound machine equipped with a research package (G40, Siemens Healthcare, Mountain View, CA, USA) from a tissue mimicking phantom. The execution time for each beamforming algorithm was measured by using a time stamp produced by a CUDA timer. The IBF-PRE outperforms over other methods (i.e., IBF-POST, PRBF-PRE, PRBF-POST), in terms of execution time, i.e., 7.89 ms vs. 16.19 ms, 21.89 ms, and 10.62 ms, respectively. This result indicates that the IBF-PRE method is suitable for the fully software based ultrasound imaging system.