Usually radar target recognition methods only use a single type of high-resolution radar signal, e.g., high-resolution range profile (HRRP) or synthetic aperture radar (SAR) images. In fact, in the SAR imaging procedure, we can simultaneously obtain both the HRRP data and the corresponding SAR image, as the information contained within them is not exactly the same. Although the information contained in the HRRP data and the SAR image are not exactly the same, both are important for radar target recognition. Therefore, in this paper, we propose a novel end-to-end two stream fusion network to make full use of the different characteristics obtained from modeling HRRP data and SAR images, respectively, for SAR target recognition. The proposed fusion network contains two separated streams in the feature extraction stage, one of which takes advantage of a variational auto-encoder (VAE) network to acquire the latent probabilistic distribution characteristic from the HRRP data, and the other uses a lightweight convolutional neural network, LightNet, to extract the 2D visual structure characteristics based on SAR images. Following the feature extraction stage, a fusion module is utilized to integrate the latent probabilistic distribution characteristic and the structure characteristic for the reflecting target information more comprehensively and sufficiently. The main contribution of the proposed method consists of two parts: (1) different characteristics from the HRRP data and the SAR image can be used effectively for SAR target recognition, and (2) an attention weight vector is used in the fusion module to adaptively integrate the different characteristics from the two sub-networks. The experimental results of our method on the HRRP data and SAR images of the MSTAR and civilian vehicle datasets obtained improvements of at least 0.96 and 2.16%, respectively, on recognition rates, compared with current SAR target recognition methods.