In this research, we present a depth prediction model designed for a range of applications, moving beyond the traditional scope of assisted and autonomous driving systems. Our model emphasizes absolute accuracy over relative accuracy, tackling the challenge of performance deterioration at extended ranges.
To bolster our novel design, we employed the AirSim Unreal Engine simulator to develop a tailored dataset, capturing various scene locations. This approach aids in mitigating model overfitting to nuances such as textures and colors. With over 2.7 million images from diverse scene locations under different environmental conditions, our dataset provided a rich variety of perspectives and distances for training. We further enriched the dataset with images from 14 RGB and depth sensor pairs, strategically placed at varied pitch and yaw angles on a drone, enhancing the model’s adaptability. Notably, our reliance on simulation data aligns our model closely with real-world scenarios.
At the core of our model are features like the overlap patch embedding block, an optimized self-attention mechanism, and a Mixed-Feed Forward Network. Together, they facilitate improved depth prediction, even at considerable distances. Empirical evaluations show consistent performance across a broad depth range, with a Mean Absolute Percent Error (MAPE) of 5-10% maintained up to 1900 meters. However, performance decreases beyond this range, signaling opportunities for future enhancements.
Regarding real-world results, due to the lack of available supervision, real data was analyzed qualitatively. Preliminary observations suggest that the outcomes appear reasonable and align well with expectations, although quantitative validations remain a direction for future research.
Our research provides statistical evidence and visual illustrations of our model’s capabilities in depth prediction. The combination of our approach and insights from the simulation data suggests potential for further advancements in the field of depth prediction.