Several metrics have been reported in the literature to assess stereo image quality, mostly based on visual attention or human visual sensitivity based distortion prediction with the help of disparity information, which do not consider the combined aspects of human visual processing. In this paper, visual attention and depth assisted stereo image quality assessment model (VAD-SIQAM) is devised that consists of three main components, i.e., stereo attention predictor (SAP), depth variation (DV), and stereo distortion predictor (SDP). Visual attention is modeled based on entropy and inverse contrast to detect regions or objects of interest/attention. Depth variation is fused into the attention probability to account for the amount of changed depth in distorted stereo images. Finally, the stereo distortion predictor is designed by integrating distortion probability, which is based on low-level human visual system (HVS), responses into actual attention probabilities. The results show that regions of attention are detected among the visually significant distortions in the stereo image pair. Drawbacks of human visual sensitivity based picture quality metrics are alleviated by integrating visual attention and depth information. We also show that positive correlation with ground-truth attention and depth maps are increased by up to 0.949 and 0.936 in terms of the Pearson and the Spearman correlation coefficients, respectively.