2020
DOI: 10.48550/arxiv.2011.13628
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving

Abstract: The strong demand of autonomous driving in the industry has lead to strong interest in 3D object detection and resulted in many excellent 3D object detection algorithms. However, the vast majority of algorithms only model singleframe data, ignoring the temporal information of the sequence of data. In this work, we propose a new transformer, called Temporal-Channel Transformer, to model the spatial-temporal domain and channel domain relationships for video object detecting from Lidar data. As a special design o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 37 publications
0
1
0
Order By: Relevance
“…An emerging thread of work aims at applying transformers to vision tasks such as object detection [5], semantic segmentation [115,99], 3D reconstruction [72], pose estimation [107], generative modeling [14], image retrieval [27], medical image segmentation [13,97,111], point clouds [40], video instance segmentation [103], object re-identification [47], video retrieval [33], video dialogue [64], video object detection [110] and multi-modal tasks [73,23,80,53,108]. A separate line of works attempts at modeling visual data with learnt discretized token sequences [104,83,14,109,18].…”
Section: Related Workmentioning
confidence: 99%
“…An emerging thread of work aims at applying transformers to vision tasks such as object detection [5], semantic segmentation [115,99], 3D reconstruction [72], pose estimation [107], generative modeling [14], image retrieval [27], medical image segmentation [13,97,111], point clouds [40], video instance segmentation [103], object re-identification [47], video retrieval [33], video dialogue [64], video object detection [110] and multi-modal tasks [73,23,80,53,108]. A separate line of works attempts at modeling visual data with learnt discretized token sequences [104,83,14,109,18].…”
Section: Related Workmentioning
confidence: 99%