2022
DOI: 10.48550/arxiv.2204.08790
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Abstract: Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets/tasks. However, it remains a challenge to evaluate the transferablity of these foundation models due to the lack of easy-to-use toolkits for fair benchmarking. To tackle this, we build ELEVATER 1 , the first benchmark to compare and evaluate pre-trained language-augmented… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 34 publications
0
6
0
Order By: Relevance
“…We focus on three types of downstream tasks for evaluation: Object Detection in the Wild: object detection in the wild test a model's ability to adapt to various different domains with drastically different label sets. ELEVATER [24] is a new object detection benchmark that is composed of 35 diverse real-world challenging domains with full-shot, few-shot and zero-shot training settings. Note that there are two variations of data used in prior work [20].…”
Section: Downstream Tasksmentioning
confidence: 99%
See 1 more Smart Citation
“…We focus on three types of downstream tasks for evaluation: Object Detection in the Wild: object detection in the wild test a model's ability to adapt to various different domains with drastically different label sets. ELEVATER [24] is a new object detection benchmark that is composed of 35 diverse real-world challenging domains with full-shot, few-shot and zero-shot training settings. Note that there are two variations of data used in prior work [20].…”
Section: Downstream Tasksmentioning
confidence: 99%
“…The proposed method is evaluated on three downstream tasks: object detection in the wild (ODinW) [24], openvocabulary detection, and phrase grounding [25]. Results show that OmDet is able to outperform all prior art, including the powerful GLIP [20] that is pre-trained on much larger datasets.…”
Section: Introductionmentioning
confidence: 99%
“…For few-shot benchmark experiments, we conduct experiments on 20 image classification datasets from the EL-EVATER benchmark (Li et al 2022b) on four Quadro RTX A6000 GPUs. Detailed dataset statistics are given in the supplementary material.…”
Section: Experiments Datasetsmentioning
confidence: 99%
“…For benchmark experiments, we use the SGD (Ruder 2016) optimizer with the learning rate and weight decay being automatically searched for all methods so that these two hyperparameters have the optimum combination. We borrow the automatic hyper-parameter tuning toolkit from Li et al (2022b). Training epochs are set via grid search.…”
Section: Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation