“…In real-world applications, it is infeasible to annotate a large amount of training data. Therefore, unsupervised representation learning has been widely studied in many computer vision tasks like image classification [1,13,14,20,38,26,48], image retrieval [21], and object detection [19,31,4,40]. Recently, self-supervised learning methods [2,15,18,30,36,41,54] were in favor for unsupervised pre-training tasks, where a contrastive loss was adopted to learn instance discriminative representations.…”