2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00290
|View full text |Cite
|
Sign up to set email alerts
|

An End-to-End Transformer Model for 3D Object Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

2
220
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 377 publications
(222 citation statements)
references
References 59 publications
2
220
0
Order By: Relevance
“…Given positions and colors as input, our PointGroup detector (second row) clearly outperforms VoteNet. Using multiview features and normals instead of RGB colors, our Point-Group based detector gives improved detection results of 50.7 mAP@0.5, which outperforms the current state-of-theart detectors [41,63] on the validation set of ScanNet v2 with gains of 3.7 and 2.6 respectively. Also, our Point-Group generates notably better detections for small and thin objects than VoteNet, such as picture ("pic.")…”
Section: Pointgroup Implementation Detailsmentioning
confidence: 94%
“…Given positions and colors as input, our PointGroup detector (second row) clearly outperforms VoteNet. Using multiview features and normals instead of RGB colors, our Point-Group based detector gives improved detection results of 50.7 mAP@0.5, which outperforms the current state-of-theart detectors [41,63] on the validation set of ScanNet v2 with gains of 3.7 and 2.6 respectively. Also, our Point-Group generates notably better detections for small and thin objects than VoteNet, such as picture ("pic.")…”
Section: Pointgroup Implementation Detailsmentioning
confidence: 94%
“…As such, we have reached a stage where researchers are moving on to improve performance on even complex computer vision problems, viz. 3D object detection [32], action detection and localization [3] tracking objects across videos, event recognition and scene understanding [33][4], etc. In this chapter, we focus on the task of event/activity recognition in videos, done with the assistance of frame-wise object detection, which enables inter-frame tracking of objects.…”
Section: Video Activity Recognition Assisted By Object Detectionmentioning
confidence: 99%
“…To handle the second issue, we adopt the set-to-set loss to match the predicted sequences with the ground truths. Unlike existing approaches [3,21,36] that utilize the sum of the classification and regression loss as the cost function for bipartite matching, in this paper, we propose a novel metric to measure the similarity between two sequences. Then we perform bipartite matching by maximizing the global similarity of the prediction and the ground truth set using the proposed metric.…”
Section: Introductionmentioning
confidence: 99%