2021
DOI: 10.48550/arxiv.2105.04447
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation

Abstract: We propose a novel scene flow estimation approach to capture and infer 3D motions from point clouds. Estimating 3D motions for point clouds is challenging, since a point cloud is unordered and its density is significantly non-uniform. Such unstructured data poses difficulties in matching corresponding points between point clouds, leading to inaccurate flow estimation. We propose a novel architecture named Sparse Convolution-Transformer Network (SCTN) that equips the sparse convolution with the transformer. Spe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 52 publications
0
4
0
Order By: Relevance
“…2D motion estimation is also referred as optical flow estimation [14,27,57,58], which aims at finding pixel-wise motions between consecutive images. Recently, 2D motion estimation has been extended to 3D domain with point convolution [30,32] and 3D convolution [21,31]. These works show the potential of 3D motion estimation and compensation for dynamic point cloud compression.…”
Section: Motion Estimationmentioning
confidence: 99%
“…2D motion estimation is also referred as optical flow estimation [14,27,57,58], which aims at finding pixel-wise motions between consecutive images. Recently, 2D motion estimation has been extended to 3D domain with point convolution [30,32] and 3D convolution [21,31]. These works show the potential of 3D motion estimation and compensation for dynamic point cloud compression.…”
Section: Motion Estimationmentioning
confidence: 99%
“…Since its promise firstly demonstrated in Vision Transformer (ViT) [23], we have witnessed a flourish of full-Transformer models for image classification [57,63,67,44,80,59], object detection [9,91,84,20] and semantic segmentation [61,65]. Beyond these static image tasks, it has also been applied on various temporal understanding tasks, such as action recognition [41,83,11], object tracking [15,62], scene flow estimation [39].…”
Section: Introductionmentioning
confidence: 99%
“…Transformers [60], originally proposed for natural language processing (NLP), have become a prevalent architecture in computer vision since the seminal work of Vision Transformer (ViT) [16]. Its promise has been demonstrated in various vision tasks including image classification [56,63,67,41,78,59], object detection [3,87,83,12], segmentation [61,65,9], and beyond [35,81,4,7,62,33].…”
Section: Introductionmentioning
confidence: 99%