At present, compared to 3D convolution, 2D convolution is less computationally expensive and faster in stereo matching methods based on convolution. However, compared to the initial cost volume generated by calculation using a 3D convolution method, the initial cost volume generated by 2D convolution in the relevant layer lacks rich information, resulting in the area affected by illumination in the disparity map having a lower robustness and thus affecting its accuracy. Therefore, to address the lack of rich cost volume information in the 2D convolution method, this paper proposes a multi-scale adaptive cost attention and adaptive fusion stereo matching network (MCAFNet) based on AANet+. Firstly, the extracted features are used for initial cost calculation, and the cost volume is input into the multi-scale adaptive cost attention module to generate attention weight, which is then combined with the initial cost volume to suppress irrelevant information and enrich the cost volume. Secondly, the cost aggregation part of the model is improved. A multi-scale adaptive fusion module is added to improve the fusion efficiency of cross-scale cost aggregation. In the Scene Flow dataset, the EPE is reduced to 0.66. The error matching rates in the KITTI2012 and KITTI2015 datasets are 1.60% and 2.22%, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.