Achieving omnidirectional walking for bipedal robots is considered one of the most challenging tasks in robotics technology. Reinforcement learning (RL) methods have proved effective in bipedal walking tasks. However, most existing methods use state machines to switch between multiple policies and achieve omnidirectional gait, which results in shaking during the policy switching process for bipedal robots. To achieve a seamless transition between omnidirectional gait and transient motion for full-size bipedal robots, we propose a novel multi-agent RL method. Firstly, a multi-agent RL algorithm based on the actor–critic framework is designed, and policy entropy is introduced to improve exploration efficiency. By learning agents with parallel initial state distributions, we minimize reliance on gait planner effectiveness in the Robot Operating System (ROS). Additionally, we design a novel heterogeneous policy experience replay mechanism based on Euclidean distance. Secondly, considering the periodicity of bipedal robot walking, we develop a new periodic gait function. Including periodic objectives in the policy can accelerate the convergence speed of training periodic gait functions. Finally, to enhance the robustness of the policy, we construct a novel curriculum learning method by discretizing Gaussian distribution and incorporate it into the robot’s training task. Our method is validated in a simulation environment, and the results show that our method can achieve multiple gaits through a policy network and achieve smooth transitions between different gaits.