Addressing the current issue of limited control methods for badminton serving devices, this paper proposes a vision-based multimodal control system and method for badminton serving. The system integrates computer vision recognition technology with traditional control methods for badminton serving devices. By installing vision capture devices on the serving device, the system identifies various human body postures. Based on the content of posture information, corresponding control signals are sent to adjust parameters such as launch angle and speed, enabling multiple modes of serving. Firstly, the hardware design for the badminton serving device is presented, including the design of the actuator module through 3D modeling. Simultaneously, an embedded development board circuit is designed to meet the requirements of multimodal control. Secondly, in the aspect of visual perception for human body recognition, an improved BlazePose candidate region posture recognition algorithm is proposed based on existing posture recognition algorithms. Furthermore, mappings between posture information and hand information are established to facilitate parameter conversion for the serving device under different postures. Finally, extensive experiments validate the feasibility and stability of the developed system and method.