“…Forecasting sequences in real-world settings, particularly from raw sensor measurements, is a complex problem due to the the exponential timespace space dimensionality, the probabilistic nature of the future and the complex dynamics of the scene. Whilst much effort from the research community has been devoted to video forecasting [18], [44], [50], [64] and semantic forecasting [3], [24], [53], [57], depth and ego-motion forecasting have not received the same interest despite their importance. The geometry of the scene is essential for applications such as planning the trajectory of an agent.…”