“…Several recent work studied such challenges from various directions, including: (1) Inspired by the great success of self-supervised learning (SSL) with images and videos (e.g., [5,6,8,10,14,15,17,21,31,32,37,40,52,54,55,61,71]), some RL methods [1,42,46,59,63,69,81,88] take advantage of self-supervised learning. This is typically done by applying both self-supervised loss and reinforcement learning loss in one batch.…”