Autonomous driving decision is a critical component of automatic driving system, which informs and updates the unmanned vehicle of object movements. However, end-to-end autonomous driving decision is still a great challenge due to the different scales of traffic target in the wild dynamic traffic scenes. To solve these problems, this paper proposes a novel model with attention mechanism and spatiotemporal features extraction. Specifically, for the important spatial information of traffic targets with scale differences, the spatial dimensions of height H , width W and channel C are independent of each other to build sparse spatial attention map. Moreover, different sparse masks are trained for spatial network by pruning elements of feature maps at the end of each block of backbone, which improves the accuracy of the two subnetworks of spatial network by 2.3% and 3.9%, respectively. Then the extracted spatial information is introduced jointly to the time sequence network with previous speed as input to obtain the vehicle steering angle and speed. Experiments on public virtual datasets show that the prediction accuracy of the model reaches 85.8%. Compared with other state-of-the-art models, our model increased by 4.8% and 2.2%, respectively.