Low visibility always leads to serious traffic accidents worldwide, although extensive works are studied to deal with the estimation of visibility in meteorology areas, it is still a tough problem. Deep learning-based visibility estimation methods, it has low accuracy due to lacking "specific features" of the foggy images. Meanwhile, physical modelbased visibility estimation methods are only applicable to some specific scenes due to its high requirements for extra auxiliary parameters. Therefore, This paper proposes a novel end-to-end framework named STCN-Net for visibility estimation, which combined the "engineered features" and "learned features" to achieve higher accuracy. Specifically, a novel 3D multi-feature stream Matrix, named DDT, is designed for visibility estimation, which is consisted of a transmittance matrix, a dark channel matrix, and a depth matrix. Unlike traditional deep learning methods which only use convolutional neural networks (CNN) to deal with the input data or images, our method combines CNN and Transformer to process the input data or images. In STCN-Net, Swin-Transformer(Swin-T) module takes the original image as input while the CNN module takes the DDT matrix as input. Moreover, in order to integrate different feature information from the CNN and Swin-T, we embed a Coordinate Attention (CA) module in STCN-Net. Finally, two visibility datasets: Visibility Image Dataset Ⅰ (VID I) and Visibility Image Dataset Ⅱ (VID II) were constructed for evaluation where VID I is a real scene visibility dataset and VID II is a synthetic visibility dataset. The experimental results show that our method has better performance than classical methods on the two datasets. And compared with the runner-up, it has 2.1% more accuracy in VID I and 0.5% in VID II.