Synthetic Aperture Radar (SAR) imaging and visible light imaging are the two most commonly used imaging methods for remote sensing satellites. Since the imaging information of the two is highly complementary, many scenarios of data fusion need to use these two heterogeneous data. However, before data fusion, the data of the two modalities need to be aligned, and the performance of heterogeneous data matching algorithm directly affects the performance of obtaining ground control points during alignment. At present, there are many one-stage and two-stage methods for heterogeneous remote sensing image matching. The existing one-stage methods have problems such as large average prediction offset, low matching accuracy, and imbalance of features at different levels when combining features. The stage method cannot meet the actual needs in terms of speed and accuracy. To address these issues, this paper proposes an end-to-end heterogeneous remote sensing image matching framework HB3CF. The framework uses the classic image feature extraction network to construct a pair of pseudo-twin networks, which extract features from two heterogeneous images respectively. Then, the features of each level are uniformly sampled on the channel using the convolution layer to reduce the weight of the high-dimensional features in the joint features, which effectively improves the expression ability of the features of each level of the model. Finally, the matching results are obtained by performing convolution, cross-correlation and up-sampling operations on the high-dimensional features of the SAR image and the optical image. Experiments show that the average offset error of the model is reduced by about 25% compared to state-of-the-art methods. When the average offset error is less than or equal to 0 pixel, 1 pixel, 2 pixel and 3 pixel, the accuracy is increased by 8.53%, 9.54%, 4.16% and 1.12% respectively, reaching 25.90%, 65.03%, 86.65% and 92.15%, which greatly improves the accuracy of the matching method of heterogeneous images, and explores the application of deep learning methods in large-scale heterogeneous remote sensing image matching tasks.