Object detection and depth perception are key foundations of object tracking and machine navigation, facilitating a thorough perception and understanding of the surrounding environment. Currently, autonomous vehicles employ complex and bulky systems with high cost and energy consumption to achieve demanding multimodal vision. An imperative exists for the development of compact and reliable technology to enhance the cost-effectiveness and efficiency of autonomous driving systems. Meta-lens, a novel flat optical device, has an artificial nanoantenna array to manipulate the light properties. It is lightweight, ultrathin, and easy to integrate, making it suitable for various applications. We developed a stereo vision meta-lens imaging system for assisted driving vision, a comprehensive perception including imaging, object detection, instance segmentation, and depth information. The compact system comprises a band-pass filter, a stereo vision meta-lens, and a complementary metal oxide semiconductor (CMOS) sensor. In comparison to traditional two-camera-based stereo vision systems, the meta-lens stereo vision imaging system eliminates the need for distortion correction or camera calibration. A tailored data processing pipeline is proposed with an intensity and depth gradient cross-validation optimization mechanism and three deep learning modules for object detection, instance segmentation, and stereo matching foundations. Final assisted driving vision provides multimodal perception by integrating the raw image, instance labels, bounding boxes, segmentation masks in depth pseudo color, and depth information for each detected object. Our assisted driving vision based on a stereo meta-lens system offers a comprehensive perception for scene understanding of machines, benefiting the applications of human−computer interaction, machine navigation, autonomous driving, and augmented reality.