Automatic Content-Aware Projection for 360° Videos

Kim, Yeong Won; Lee, Chang-Ryeol; Cho, Dae-Yong; Kwon, Yong Hoon; Choi, Hyeok-Jae; Yoon, Kuk-Jin

doi:10.1109/iccv.2017.508

Cited by 21 publications

(20 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their method preserves linear structures. Another distortion minimizing projection is automatic contentaware projection by Kim et al [16]. In their work, model interpolation between local and global projections is used to adjust distortions in optimized Pannini projection.…”

Section: A Sphere-to-plane Projectionmentioning

confidence: 99%

Object Detection in Equirectangular Panorama

Yang

Qian

Kämäräinen

et al. 2018

2018 24th International Conference on Pattern Recognition (ICPR)

View full text Add to dashboard Cite

We introduce a high-resolution equirectangular panorama (aka 360-degree, virtual reality, VR) dataset for object detection and propose a multi-projection variant of the YOLO detector. The main challenges with equirectangular panorama images are i) the lack of annotated training data, ii) highresolution imagery and iii) severe geometric distortions of objects near the panorama projection poles. In this work, we solve the challenges by I) using training examples available in the "conventional datasets" (ImageNet and COCO), II) employing only low resolution images that require only moderate GPU computing power and memory, and III) our multi-projection YOLO handles projection distortions by making multiple stereographic subprojections. In our experiments, YOLO outperforms the other state-of-the-art detector, Faster R-CNN, and our multi-projection YOLO achieves the best accuracy with low-resolution input.

show abstract

Section: A Sphere-to-plane Projectionmentioning

confidence: 99%

Object Detection in Equirectangular Panorama

Yang

Qian

Kämäräinen

et al. 2018

2018 24th International Conference on Pattern Recognition (ICPR)

View full text Add to dashboard Cite

show abstract

“…Content-aware projection Built on spherical projection methods, content-based projections make image-specific choices to reduce distortion. Recent work [8] optimizes the parameters in the Pannini projection [7] to preserve regions with greater low-level saliency and straight lines. Interactive methods [10,11,12,13] require a user to outline regions of interest that should be preserved or require input from a user to determine projection orientation [14].…”

Section: Related Workmentioning

confidence: 99%

“…Advances in the projection method could be applied in concert with our algorithm, e.g., as post-processing to enhance the rotated faces further. For example, when generating cubemaps, one could replace rectilinear projection with others [7,8,10] and keep the rest of our learning framework unchanged. Furthermore, the proposed snap angles respect high-level image content-detected foreground objects-as opposed to typical lower-level cues like line straightness [12,10] or low-level saliency metrics [8].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Snap Angle Prediction for 360$$^{\circ }$$ Panoramas

Xiong

Grauman

2018

Computer Vision – ECCV 2018

View full text Add to dashboard Cite

360 • panoramas are a rich medium, yet notoriously difficult to visualize in the 2D image plane. We explore how intelligent rotations of a spherical image may enable content-aware projection with fewer perceptible distortions. Whereas existing approaches assume the viewpoint is fixed, intuitively some viewing angles within the sphere preserve high-level objects better than others. To discover the relationship between these optimal snap angles and the spherical panorama's content, we develop a reinforcement learning approach for the cubemap projection model. Implemented as a deep recurrent neural network, our method selects a sequence of rotation actions and receives reward for avoiding cube boundaries that overlap with important foreground objects. We show our approach creates more visually pleasing panoramas while using 5x less computation than the baseline.

show abstract

“…While allowing kernel sharing and hence smaller mod-2. Related Work 360 • vision Ongoing work explores new projection models optimized for image display [5,25,44] or video storage [1,4,28,29,39]. We adopt the most common equirectangular projection so our algorithm can be readily applied to existing data.…”

Section: Introductionmentioning

confidence: 99%

Kernel Transformer Networks for Compact Spherical Convolution

Grauman

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

111

View full text Add to dashboard Cite

Ideally, 360 • imagery could inherit the deep convolutional neural networks (CNNs) already trained with great success on perspective projection images. However, existing methods to transfer CNNs from perspective to spherical images introduce significant computational costs and/or degradations in accuracy. We present the Kernel Transformer Network (KTN) to efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360 • images. Given a source CNN for perspective images as input, the KTN produces a function parameterized by a polar angle and kernel as output. Given a novel 360 • image, that function in turn can compute convolutions for arbitrary layers and kernels as would the source CNN on the corresponding tangent plane projections. Distinct from all existing methods, KTNs allow model transfer: the same model can be applied to different source CNNs with the same base architecture. This enables application to multiple recognition tasks without re-training the KTN. Validating our approach with multiple source CNNs and datasets, we show that KTNs improve the state of the art for spherical convolution. KTNs successfully preserve the source CNN's accuracy, while offering transferability, scalability to typical image resolutions, and, in many cases, a substantially lower memory footprint 1 . els, this approach degrades accuracy-especially for deeper networks-due to an implicit interpolation assumption, as we will explain below. The third approach defines convolution in the spectral domain [11,15], which has significant memory overhead and thus far limited applicability to realworld data. All of the above require retraining to handle a new recognition task.In light of these shortcomings, we propose the Kernel Transformer Network (KTN). The KTN adapts source CNNs trained on perspective images to 360 • images. Instead of learning a new CNN on 360 • images for a specific task, KTN learns a function that takes a kernel in the source CNN as input and transforms it to be applicable to a 360 • image in its equirectangular projection. See Fig. 1 (C). The function accounts for the distortion in 360 • images, returning different transformations depending on both the polar angle θ and the source kernel. The model is trained to reproduce the outputs of the source CNN on the perspective projection for each tangent plane on an arbitrary 360 • image. Hence, KTN learns to behave similarly to the source CNN while avoiding repeated projection of the image.Key highlights of the proposed KTN are its transferability and compactness-both of which owe to our functionbased design. Once trained for a base architecture, the same KTN can transfer multiple source CNNs to 360 • images. For example, having trained a KTN for VGG [36] on Ima-geNet classification, we can transfer the same KTN to run a VGG-based Pascal object detector on 360 • panoramas. This is possible because the KTN takes the source CNN as input rather than embed the CNN kernels into its own parameters (unlike [11, 12, 15, 37, 46]). Further...

show abstract

Automatic Content-Aware Projection for 360° Videos

Cited by 21 publications

References 9 publications

Object Detection in Equirectangular Panorama

Object Detection in Equirectangular Panorama

Snap Angle Prediction for 360$$^{\circ }$$ Panoramas

Kernel Transformer Networks for Compact Spherical Convolution

Contact Info

Product

Resources

About