Self-timed pipelines (STPs) are becoming attractive because of their power performance efficiency. A circular STP which realizes a looped data flow is necessary to directly implement not only iterative or recursive operations but also circular data paths for program execution. To facilitate product development or prototyping of STP circuits on a commercial field-programmable gate array (FPGA), several research efforts have already made it possible to utilize industry-standard electronic design automation (EDA) tools. However, how to adequately achieve a circular STP whose data transfer is realized by a so-called four-phase bundled-data is still unknown. In this paper, we point out that conventional circuits lead to a design failure or even unacceptably deteriorated throughput because EDA tools improperly interpret their configuration, especially in the realization of functions such as pipeline branching and a data copy and erasure. We propose a circular STP design method composed of both a low-latency handshake circuit configuration and its design procedure. Our proposed method guides the EDA tools to exploit FPGA's intrinsic low-latency paths. We evaluate a circular STP implementing a data-driven processor under corner conditions and show that our method can extract the maximum throughput of target pipelined circuits, which indicates the circular STPs wider applicability.