“…One prominent example is where they show that GPT-3, which is a large transformer model [Vaswani et al, 2017] trained on a large corpus of data, achieves substantial performance on many natural language processing (NLP) tasks and benchmarks in few-shot settings. On image recognition tasks, training on Instagram images [Mahajan et al, 2018] and JFT-300 [Sun et al, 2017] has been proven to be very effective in transfer and few-shot settings [Goyal et al, 2021, Pham et al, 2020, Dosovitskiy et al, 2020, Dumoulin et al, 2021]. Even when no example is provided (zero-shot), CLIP [Radford et al, 2021], which consists of a pair of image encoder and text encoder models trained with a contrastive loss on 400 million image-text pairs from the internet, can achieve remarkable performance.…”