A reinforcement learning-based neuro-fuzzy gait synthesizer, which is based on the GARIC (Generalized Approximate Reasoning for Intelligent Control) architecture, is proposed for the problem of biped dynamic balance. We modify the GARIC architecture to enable it to generate the trunk trajectory in both sagittal and frontal plane. The proposed gait synthesizer is trained by reinforcement learning that uses a multi-valued scalar signal to evaluate the degrees of failure or success for the biped locomotion by means of the ZMP (Zero Moment Point). It can form the initial dynamic balancing gait from linguistic rules, which are obtained from human intuitive balancing knowledge and biomechanics studies, and accumulate dynamic balancing knowledge through reinforcement learning, and thus constantly improve its gait during walking. The feasibility of the proposed method is veri®ed through a 5-link biped robot simulation.