2022
DOI: 10.48550/arxiv.2202.04942
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Spherical Transformer

Abstract: Using convolutional neural networks for 360 • images can induce sub-optimal performance due to distortions entailed by a planar projection. The distortion gets deteriorated when a rotation is applied to the 360 • image. Thus, many researches based on convolutions attempt to reduce the distortions to learn accurate representation. In contrast, we leverage the transformer architecture to solve image classification problems for 360 • images. Using the proposed transformer for 360 • images has two advantages. Firs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 17 publications
0
5
0
Order By: Relevance
“…To exclude the style information, we conduct such measurement in grayscale image format, shown in Tab. 6. We indicate such FID measurement as FID c .…”
Section: A5 Evaluation Detailsmentioning
confidence: 98%
See 1 more Smart Citation
“…To exclude the style information, we conduct such measurement in grayscale image format, shown in Tab. 6. We indicate such FID measurement as FID c .…”
Section: A5 Evaluation Detailsmentioning
confidence: 98%
“…To deal with such discontinuity and distortion, recent works introduce modeling in spherical domain [13,9], projecting an image to local tangent patches with minimal geometric error. It is proved that leveraging transformer architecture in 360°image modeling reduces distortions caused by projection and rotation [6]. For this reason, recent approaches [44,45] including PAVER [60], PanoFormer [48], and Text2Light [4] used the transformer achieving global structural consistency.…”
Section: Related Workmentioning
confidence: 99%
“…We use Base to denote fusion module with direct addition, Fusion to denote fusion module with our CAF. The Base fusion module is only consisted of two simple residual convolution units from work [5] using basic spherical convolution method in previous works [14], [25].…”
Section: B Cross Attention Fusionmentioning
confidence: 99%
“…F F N denotes feed-forward operation, and LN denotes layer norm operation [25], the two operations formulate a residuallike structure. Q i , K i and V i used in Equation 2 are generated from input X 0 and X 1 using corresponding learnable parameters W :…”
Section: B Cross Attention Fusionmentioning
confidence: 99%
See 1 more Smart Citation