“…Self-supervised learning on video collections. Learning from video [2,10,15,17,21,22,30,31,35,40,42,47,52,62,64] is a powerful paradigm, as unlike with image collections, there is additional temporal and sequential information. The aim of self-supervised learning from video can be to learn to predict future frames [47], or to learn to predict depth [12,14,62].…”