“…During the initial days of self-supervised learning, a lot of research was done on handcrafting pre-training tasks, also known as pretext tasks. These handcrafted tasks include geometric transformation prediction [1,2,3], context prediction [4,5], jigsaw puzzle solving [6,7,8,9], temporal order related tasks for videos [10,11,12,13,14], pace prediction in videos [15], image colorization [16], etc. These pretext tasks are aimed at learning representations that are invariant to transformations, context, etc.…”