LDMIC: Learning-based Distributed Multi-view Image Coding

Zhang, Xinjie; Shao, Jinjun; Zhang, Jun

doi:10.48550/arxiv.2301.09799

Cited by 1 publication

(6 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3.4) on any dataset of stereo image pairs. Unlike other recent methods [11,45], the proposed method does not include any autoregressive components, which allows for fast encoding and decoding (see Sec. 4.4).…”

Section: Methodsmentioning

confidence: 99%

“…The images have a resolution of 2048×1024 and are divided into 2975 training, 500 validation, and 1525 test image pairs. Following conventions [45,42] we crop 64, 256 and 128 pixels from the top, bottom and sides respectively to remove car parts and artefacts from the rectification process. The InStereo2k dataset contains 2060 stereo images of indoor scenes.…”

Section: Methodsmentioning

confidence: 99%

“…We also report scores for the learned stereo compression methods DSIC [24], HESIC [11], and SASIC [42] from the respective papers. For LDMIC [45] we show the reported scores for the full LDMIC method, which includes an autoregressive context model, and a smaller version LDMIC (fast) without the autoregressive components.…”

Section: Methodsmentioning

confidence: 99%

“…This is further enhanced by using stereo attention between the two images in the common decoder. Cross-attention in the decoder is also used in the recently proposed distributed multiview method LDMIC by Zhang et al [45]. Contrary to our method, they employ global encoder-to-decoder crossattention and a single image autoregressive entropy model.…”

Section: Stereo Image Compressionmentioning

confidence: 99%

“…7 depicts the average encoding and decoding times of our method against other methods on the InStereo2k dataset. The conventional methods BPG, HEVC and MV-HEVC were evaluated on an Intel Xeon Gold 6230R pro-cessor with a single core (times are taken from Zhang et al times [45]). For LDMIC we show their reported encoding and decoding times [45] (measured on an NVIDIA RTX 3090 GPU).…”

Section: Coding Complexitymentioning

confidence: 99%

See 4 more Smart Citations

SASIC: Stereo Image Compression with Latent Shifts and Stereo Attention

Wödlinger

Kotera

Xu³

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

In this paper, we present ECSIC, a novel learned method for stereo image compression. Our proposed method compresses the left and right images in a joint manner by exploiting the mutual information between the images of the stereo image pair using a novel stereo cross attention (SCA) module and two stereo context modules. The SCA module performs cross-attention restricted to the corresponding epipolar lines of the two images and processes them in parallel. The stereo context modules improve the entropy estimation of the second encoded image by using the first image as a context. We conduct an extensive ablation study demonstrating the effectiveness of the proposed modules and a comprehensive quantitative and qualitative comparison with existing methods. ECSIC achieves state-of-theart performance among stereo image compression models on the two popular stereo image datasets Cityscapes and InStereo2k while allowing for fast encoding and decoding, making it highly practical for real-time applications.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%