Monocular Depth Estimation (MDE) is fundamental in sports video understanding, enhancing augmented graphics, scene understanding, and game state reconstruction. Despite remarkable progress in autonomous driving and indoor scene understanding, there is currently a lack of MDE datasets tailored for sports. Furthermore, most existing datasets only focus on single images, disregarding the temporal aspect. In this work, we introduce the first video dataset for MDE in sports, SoccerNet-Depth, focusing on football and basketball videos. In particular, we leverage the graphic engine from video games to automatically extract video sequences and their associated depth maps, making our dataset easily scalable. Furthermore, we benchmark and fine-tune several state-of-the-art MDE methods on our dataset. Our analysis shows that MDE in sports is far from being solved, making our dataset a perfect playground for future research. Dataset and codes: https://github.com/SoccerNet/sn-depth.