2020
DOI: 10.1007/978-3-030-66096-3_2
|View full text |Cite
|
Sign up to set email alerts
|

Commands for Autonomous Vehicles by Progressively Stacking Visual-Linguistic Representations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(18 citation statements)
references
References 24 publications
0
18
0
Order By: Relevance
“…The CMTR method [17] applies the Transformer encoder-decoder model to model the command and regions separately. Dai et al [5] use the VL-BERT pre-trained model [18] to jointly learn cross-modal representations for the input. They propose an iterative stacking algorithm, Stack-VL-BERT, to train a deeper VL-BERT model.…”
Section: A Language Grounding For Autonomous Vehiclesmentioning
confidence: 99%
See 2 more Smart Citations
“…The CMTR method [17] applies the Transformer encoder-decoder model to model the command and regions separately. Dai et al [5] use the VL-BERT pre-trained model [18] to jointly learn cross-modal representations for the input. They propose an iterative stacking algorithm, Stack-VL-BERT, to train a deeper VL-BERT model.…”
Section: A Language Grounding For Autonomous Vehiclesmentioning
confidence: 99%
“…We apply the UNITER [4] model from single-stream architecture and LXMERT [10] from dual-stream architecture because they achieve excellent performance in many V&L tasks [38]. On the other hand, SOTA methods [5] in this task uses another single-stream model, VL-BERT [18]. Their model obtains lower scores than UNITER and LXMERT (see Table I for the results).…”
Section: Our Layer Fusion Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…Three dimensional object detector localizes objects with tight 3D bounding boxes. Compared with monocular and stereo image based approaches [5,19,26,27,41], LiDAR based methods are more robust in autonomous driving [8,38]. Current 3D object detectors mainly represent the point cloud as raw points or voxels.…”
Section: Introductionmentioning
confidence: 99%
“…Ongoing development of various applications such as autonomous driving [5,11,32] and robotics [41] has led to the increasing demand for 3D object detectors that predict the class label, 3D bounding box and detection confidence score for each instance in a given 3D scene. In recent years, the 2D object detection methods have made great breakthroughs [6,15,22,49,55,59], far ahead of the 3D object detection methods, in particular, the anchor-free detection methods.…”
Section: Introductionmentioning
confidence: 99%