The prediction of visual quality is crucial in image and video systems. Image quality metrics based on the mean square error prevail in the field, due to their mathematical straightforwardness, even though they do not correlate well with the visual human perception. Latest achievements in the area support that the use of convolutional neural networks (CNN) to assess perceptual visual quality is a clear trend. Results in other applications, like blur detection and de-raining, indicate the combination of different receptive fields (i.e., convolutional kernels with different dimensions) improves a CNN performance. However, to the best of our knowledge, the role of different receptive fields in visual quality characterization is still an open issue. Thus, in this paper, we investigate the influence of using different receptive fields to predict image distortion. Specifically, we propose a multi-stream dense network that estimates a spatially-varying quality metric parameter from either reference or distorted images. The performance of the proposed method is compared with a competing state-of-the-art approach by using a public image database. Results show the proposed strategy outperforms the competing technique when the quality metric parameter is estimated from degraded images.