Left-behind humans inside the car or bus have caused a lot of accidents, so it is essential to detect the humans in vehicle. Current human detection methods rely on wearable devices, oxygen sensors, and special seat designs in vehicles, but those sensors cannot adapt to ever-changing environments. To solve those problems and especially to improve passengers’ safety on the bus, we propose a method to accomplishing human detection by fusion vision and microwave radar information in various environments in vehicle. For vision information, we use different networks to extract human and human face features, and fusion of the detection results in different models to improve human detection accuracy. The human detection model is MobileNet-V2, and the human face detection model is MTCNN. A new matching schedule and tracking objects management rule based on the Kernelized Correlation Filter tracker are designed to track the human and human face detection boxes. The microwave radar information is used to detect moving objects. Finally, the fusion vision and microwave radar detection results are implemented. Experiments show that our method has improved the human detection accuracy in vehicle, and this method can be used for detection of left-behind children on the school bus.