Monocular depth estimation is a popular task. Due to the difficulty of obtaining true depth labels for the bronchus and the characteristics of the bronchial image such as scarcity of texture, smoother surfaces and more holes, there are many challenges in bronchial depth estimation. Hence, we propose to use a ray tracing algorithm to generate virtual images along with their corresponding depth maps to train an asymmetric encoder-decoder transformer network for bronchial depth estimation. We propose the edge-aware unit to enhance the awareness of the bronchial internal structure considering that the bronchus has few texture features and many edges and holes. And asymmetric encoder-decoder is proposed by us for multi-layer features fusion. The experimental results of the virtual bronchial demonstrate that our method achieves the best results in several metrics, including MAE of 0.915 ± 0.596 and RMSE of 1.471 ± 1.097.