“…Similar to how humans can relate visual understanding of classes with similar-in-meaning names or categories, ZSS methods generalize semantic visual information using the semantic textual information provided by language models. Another slightly relaxed data efficient setting is FSS [8,9,10,11,12,13,14], where the model is expected to generalize to unseen classes but is additionally given few support images with annotated unseen target classes. Typical FSS methods demonstrate admirable performance using support samples ranging from one to five examples for every unseen category.…”