Mountainous forests are pivotal in the global carbon cycle, serving as substantial reservoirs and sinks of carbon. However, generating a reliable estimate remains a considerable challenge, primarily due to the lack of representative in situ measurements and proper methods capable of addressing their complex spatial variation. Here, we proposed a deep learning-based method that combines Residual convolutional neural networks (ResNet) with in situ measurements, microwave (Sentinel-1 and VOD), and optical data (Sentinel-2 and Landsat) to estimate forest biomass and track its change over the mountainous regions. Our approach, integrating in situ measurements across representative elevations with multi-source remote sensing images, significantly improves the accuracy of biomass estimation in Tibet’s complex mountainous forests (R2 = 0.80, root mean squared error = 15.8 MgC ha−1). Moreover, ResNet, which addresses the vanishing gradient problem in deep neural networks by introducing skip connections, enables the extraction of complex spatial patterns from limited datasets, outperforming traditional optical-based or pixel-based methods. The mean value of forest biomass was estimated as 162.8 ± 21.3 MgC ha−1, notably higher than that of forests at comparable latitudes or flat regions in China. Additionally, our findings revealed a substantial forest biomass carbon sink of 3.35 TgC year−1 during 2015–2020, which is largely underestimated by previous estimates, mainly due to the underestimation of mountainous carbon stock. The significant carbon density, combined with the underestimated carbon sink in mountainous regions, emphasizes the urgent need to reassess mountain forests to better approximate the global carbon budget.