Due to the high stability and adaptability, quadruped robots are currently highly discussed in the robotics field. To overcome the complicated environment indoor or outdoor, the quadruped robots should be configured with an environment perception system, which mostly contain LiDAR or a vision sensor, and SLAM (Simultaneous Localization and Mapping) is deployed. In this paper, the comparative experimental platforms, including a quadruped robot and a vehicle, with LiDAR and a vision sensor are established firstly. Secondly, a single sensor SLAM, including LiDAR SLAM and Visual SLAM, are investigated separately to highlight their advantages and disadvantages. Then, multi-sensor SLAM based on LiDAR and vision are addressed to improve the environmental perception performance. Thirdly, the improved YOLOv5 (You Only Look Once) by adding ASFF (adaptive spatial feature fusion) is employed to do the image processing of gesture recognition and achieve the human–machine interaction. Finally, the challenge of environment perception system for mobile robot based on comparison between wheeled and legged robots is discussed. This research provides an insight for the environment perception of legged robots.