This paper presents a novel approach to developing control strategies for mobile robots, specifically the Pegasus, a bionic wheel-legged quadruped robot with unique chassis mechanics that enable four-wheel independent steering and diverse gaits. A multi-agent (MA) reinforcement learning (RL) controller is proposed, treating each leg as an independent agent with the goal of autonomous learning. The framework involves a multi-agent setup to model torso and leg dynamics, incorporating motion guidance optimization signal in the policy training and reward function. By doing so, we address leg schedule patterns for the complex configuration of the Pegasus, the requirement for various gaits, and the design of reward functions for MA-RL agents. Agents were trained using two variations of policy networks based on the framework, and real-world tests show promising results with easy policy transfer from simulation to the actual hardware. The proposed framework models acquired higher rewards and converged faster in training than other variants. Various experiments on the robot deployed framework showed fast response (0.8 s) under disturbance and low linear, angular velocity, and heading error, which was 2.5 cm/s, 0.06 rad/s, and 4°, respectively. Overall, the study demonstrates the feasibility of the proposed MA-RL control framework.