In this paper, we explore the feasibility of achieving gigabit baseband throughput using the vast computational power offered by the graphics processors (GPUs). One of the most computationally intensive functions commonly used in baseband communications, the Fast Fourier Transform (FFT) algorithm, is implemented on an NVIDIA GPU using their general-purpose computing platform called the Compute Unified Device Architecture (CUDA). The paper, first, investigates the implementation of an FFT algorithm using the GPU hardware and exploiting the computational capability available. It then outlines the limitations discovered and the methods used to overcome these challenges. Finally a new algorithm to compute FFT is proposed, which reduces interprocessor communication, and it is further optimized by improving memory access, enabling the processing rate to exceed 4 Gbps, achieving a processing time of a 512-point FFT in less than 200 ns.