High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation

Lu, M.; Ma, Zhan

doi:10.48550/arxiv.2204.11448

Cited by 4 publications

(9 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that for fair comparison, we implement the method in Cheng2020 [43] and increase its number of filters N from 192 to 256 at high rates, which leads to better performance than the original results in [43]. The results of He2021 [24] are based on the source code at [45].…”

Section: Resultsmentioning

confidence: 99%

Learned Image Compression with Inception Residual Blocks and Multi-Scale Attention Module

Liang

et al. 2022

2022 Picture Coding Symposium (PCS)

View full text Add to dashboard Cite

Deep learning-based image compression has made great progresses recently. However, many leading schemes use serial context-adaptive entropy model to improve the ratedistortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we introduce four techniques to balance the tradeoff between the complexity and performance. We are the first to introduce deformable convolutional module in compression framework, which can remove more redundancies in the input image, thereby enhancing compression performance. Second, we design an improved checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential contextadaptive model. Third, we develop a three-step knowledge distillation and training scheme to achieve different trade-offs between the complexity and the performance of the decoder network, which transfers both the final and intermediate results of the teacher network to the student network to help its training. Fourth, we introduce L1 regularization to make the numerical values of the latent representation more sparse. Then we only encode non-zero channels in the encoding and decoding process, which can greatly reduce the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also 2.3% higher. Our method outperforms the traditional approach in H.266/VVC-intra (4:4:4) and some leading learned schemes in terms of PSNR and MS-SSIM metrics when testing on Kodak and Tecnick-40 datasets.

show abstract

Section: Resultsmentioning

confidence: 99%

Learned Image Compression with Inception Residual Blocks and Multi-Scale Attention Module

Liang

et al. 2022

2022 Picture Coding Symposium (PCS)

View full text Add to dashboard Cite

show abstract

“…The checkerboard model (He et al 2021) is a typical tool, in which the anchor content is encoded independently while the non-anchor content is encoded at a lower cost depending on the anchor content priors. Later, a generalized checkerboard (Lu et al 2022a) and a dual spatial prior model (Guo-Hua et al 2023) are introduced.…”

Section: Context Modelingmentioning

confidence: 99%

Another Way to the Top: Exploit Contextual Clustering in Learned Image Coding

Zhang,

Duan,

et al. 2024

AAAI

View full text Add to dashboard Cite

While convolution and self-attention are extensively used in learned image compression (LIC) for transform coding, this paper proposes an alternative called Contextual Clustering based LIC (CLIC) which primarily relies on clustering operations and local attention for correlation characterization and compact representation of an image. As seen, CLIC expands the receptive field into the entire image for intra-cluster feature aggregation. Afterward, features are reordered to their original spatial positions to pass through the local attention units for inter-cluster embedding. Additionally, we introduce the Guided Post-Quantization Filtering (GuidedPQF) into CLIC, effectively mitigating the propagation and accumulation of quantization errors at the initial decoding stage. Extensive experiments demonstrate the superior performance of CLIC over state-of-the-art works: when optimized using MSE, it outperforms VVC by about 10% BD-Rate in three widely-used benchmark datasets; when optimized using MS-SSIM, it saves more than 50% BD-Rate over VVC. Our CLIC offers a new way to generate compact representations for image compression, which also provides a novel direction along the line of LIC development.

show abstract

“…(a) Similar TinyLIC architectures are used for both lossy and lossless RIC. More details about the lossy pipeline are in [13]. (b) The detailed architecture of the lossless decoder.…”

Section: Low-level Raw Image Compressionmentioning

confidence: 99%

“…We then run a RAW-domain YOLOv3 to process RAW images, reporting better detection accuracy than the corresponding RGB-domain YOLOv3 used in several applications [12]. We also extend a variational autoencoder (VAE) based lossy/lossless RAW Image Compressor (RIC) from the TinyLIC [13] for RAW image compression. The resulting model shows superior performance to commercial approaches in both lossy and lossless modes.…”

Section: Introductionmentioning

confidence: 99%

Efficient Visual Computing With Camera RAW Snapshots

Li,

Lu,

Zhang

et al. 2024

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Conventional cameras capture image irradiance (RAW) on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel ρ-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained in the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on camera snapshots. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression efficiency compared to that in the RGB domain. Furthermore, the proposed ρ-Vision generalizes across various camera sensors and different task-specific models. An added benefit of employing the ρ-Vision is the elimination of the need for ISP, leading to potential reductions in computations and processing times.

show abstract

High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation

Cited by 4 publications

References 48 publications

Learned Image Compression with Inception Residual Blocks and Multi-Scale Attention Module

Learned Image Compression with Inception Residual Blocks and Multi-Scale Attention Module

Another Way to the Top: Exploit Contextual Clustering in Learned Image Coding

Efficient Visual Computing With Camera RAW Snapshots

Contact Info

Product

Resources

About