Forest height is a key forest parameter which is of great significance for monitoring forest resources, calculating forest biomass, and observing the global carbon cycle. Because the PolInSAR system could provide various object information including height, shape and direction sensitivity, and spatial distribution, it becomes a powerful means for measuring forest height. The proposed framework utilizes deep learning and builds upon traditional DEM differencing and coherence amplitude inversion algorithms. By using L band PolInSAR data, a new CNN model is established in which the estimated results of DEM differencing and coherence amplitude inversion are used as labels. Furthermore, the PCGrad optimization strategy is used for updating the gradient automatically in the training stage. This model could not only builds a relationship between complex coherences and forest height but also makes full use of the spatial context information by using the CNN layers. Experiments are carried out based on the simulated data and real data, named Lope forest site, which are collected by UAVSAR in the NASA AfriSAR campaign. Compared to the classic forest height inversion algorithms, the proposed framework has achieved a higher level of accuracy and performance on RMSE (10.15m) and R 2 (0.87). Overall, the proposed framework does not require LiDAR data as prior knowledge and can be performed on various forest scenes. Consequently, it will hopefully serve as a useful approach for improvements in forest height inversion based on PolInSAR data. Codes, trained model, and data will be available for public access.