Bit-reversal is an essential part of the fast Fourier transform (FFT). However, compared to the amount of works on FFT architectures, much fewer works are dedicated to bitreversal circuits until recent years. In this brief, the minimum latency and memory required for calculating the bit-reversal of continuous-flow parallel data are formulated. The formulas are generic for all power of 2 parallelism including the serial bit-reversal. Furthermore, an efficient circuit for calculating the parallel bit-reversal is proposed. The circuit not only achieves the lowest latency but also uses the minimum memory. This is achieved by breaking the bit-reversal permutation into two sub-permutations, which are implemented by a sub-bit-reversal module and a group of P buffer banks. In addition, two commutators are adopted to access the P buffer banks efficiently. The proposed circuit is simple and efficient for reordering the output samples of parallel pipelined FFT processors.Index Terms-Bit-reversal circuit, fast Fourier transform (FFT), parallel pipelined architecture.