In this paper, the architecture of a complex 128point FFT processor using parallel and rolling back structure is presented. The FFT VLSI implementation is suitable to ultra wide band (UWB) communication, because it is ensuring data speed great than 538Mbps. With new VLSI technology, the fast speed and low power are also achieved. The FFT can be operated at even high frequency up to 200MHz with FPGA while the power consumption is very low (only for 109mW at 132MHz). With 4 radix-4 FFT processors, we employ 16 parallel channels to lower down the operation clock to 132MHz with the input data of 538MHz. And CORDIC twiddles processor is applied to combat the area and speed problem compared to multiplier, so that FFT only has 53K equivalent gates without sacrificing the data processing speed and throughput.