Video Coding for Machines with Feature-Based Rate-Distortion Optimization

Fischer, Kristian; Brand, Fabian; Herglotz, Christian; Kaup, André

doi:10.1109/mmsp48831.2020.9287136

Cited by 42 publications

(25 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in Table 1, our method on average saves 37.87% bitrate for the same task performance level on object detection and 32.90% on instance segmentation, in comparison with the Pareto front of VVC anchors. As a reference, although not directly comparable, for instance segmentation using the same task-NN architecture (Mask R-CNN) but trained on CityScapes train set, [3] reports up to 9.95% of bitrate saving for the following QPs: 12, 17, 22, 27, on the CityScapes val set. Our system contains only 1.5M parameters, and is also extremely fast: the average encoding time for a 2048 × 1024 image in the val set is around 0.15 seconds, with batch size of 1 on a single RTX 2080Ti GPU 2 .…”

Section: Resultsmentioning

confidence: 99%

“…In order to directly improve the task performance, [3] and [4] propose standard-compliant methods that preserve the standard coded bitstream format by fine-tuning specific parts of the traditional codec for the targeted machines. Although the above methods manage to improve the task performance, they do not aim to completely replace the conventional pipeline for human-oriented coding, instead they only add an incremental capability to the system.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Image Coding For Machines: an End-To-End Learned Approach

Zhang

Cricri

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Over recent years, deep learning-based computer vision systems have been applied to images at an ever-increasing pace, oftentimes representing the only type of consumption for those images. Given the dramatic explosion in the number of images generated per day, a question arises: how much better would an image codec targeting machine-consumption perform against state-of-the-art codecs targeting humanconsumption? In this paper, we propose an image codec for machines which is neural network (NN) based and end-to-end learned. In particular, we propose a set of training strategies that address the delicate problem of balancing competing loss functions, such as computer vision task losses, image distortion losses, and rate loss. Our experimental results show that our NN-based codec outperforms the state-of-the-art Versatile Video Coding (VVC) standard on the object detection and instance segmentation tasks, achieving -37.87% and -32.90% of BD-rate gain, respectively, while being fast thanks to its compact size. To the best of our knowledge, this is the first end-to-end learned machine-targeted image codec.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Image Coding For Machines: an End-To-End Learned Approach

Zhang

Cricri

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Another emerging topic in the video coding space is known as video coding for machines (VCM) targeting nonhuman users in the context of machine vision. Duan et al [43] provide a comprehensive introduction to this topic, while Fischer et al [44] specifically address feature-based rate-distortion optimization in the context of VCM. Finally, MPEG has been recently started working on the coded representation of haptics that enable efficient representation and compression of time-dependent haptic signals and are suitable for the coding 13 Accessed: April 16, 2021.…”

Section: B Mpegmentioning

confidence: 99%

Special issue on Open Media Compression: Overview, Design Criteria, and Outlook on Emerging Standards

Timmerer¹,

Wien²,

Lei³

et al. 2021

Proc. IEEE

View full text Add to dashboard Cite

“…In response to this emerging challenge, many studies have been actively conducted to explore alternative coding solutions for the new use cases. There exist mainly two categories of solutions: adapting the traditional image and video codecs for machine-consumption [4,5], and employing end-to-end learned codecs that are optimized directly for machines by taking advantage of the neural network (NN) based solutions [6,7]. Each approach has its own pros and cons.…”

Section: Introductionmentioning

confidence: 99%

“…Each approach has its own pros and cons. Traditional video codec-based solutions, built upon mature technologies and broadly adopted standards, are often compatible with existing systems [4,5]. However, it is difficult to optimize the overall performance of a system that consists of a traditional video codec and neural networks that perform machine tasks [6].…”

Section: Introductionmentioning

confidence: 99%

Learned Image Coding for Machines: A Content-Adaptive Approach

Zhang

Cricri

et al. 2021

2021 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

Today, according to the Cisco Annual Internet Report (2018)(2019)(2020)(2021)(2022)(2023), the fastest-growing category of Internet traffic is machine-to-machine communication. In particular, machineto-machine communication of images and videos represents a new challenge and opens up new perspectives in the context of data compression. One possible solution approach consists of adapting current human-targeted image and video coding standards to the use case of machine consumption. Another approach consists of developing completely new compression paradigms and architectures for machine-to-machine communications. In this paper, we focus on image compression and present an inference-time content-adaptive finetuning scheme that optimizes the latent representation of an end-to-end learned image codec, aimed at improving the compression efficiency for machine-consumption. The conducted experiments targeting instance segmentation task network show that our online finetuning brings an average bitrate saving (BD-rate) of -3.66% with respect to our pretrained image codec. In particular, at low bitrate points, our proposed method results in a significant bitrate saving of -9.85%. Overall, our pretrained-and-then-finetuned system achieves -30.54% BD-rate over the state-of-the-art image/video codec Versatile Video Coding (VVC) on instance segmentation.

show abstract

Video Coding for Machines with Feature-Based Rate-Distortion Optimization

Cited by 42 publications

References 14 publications

Image Coding For Machines: an End-To-End Learned Approach

Image Coding For Machines: an End-To-End Learned Approach

Special issue on Open Media Compression: Overview, Design Criteria, and Outlook on Emerging Standards

Learned Image Coding for Machines: A Content-Adaptive Approach

Contact Info

Product

Resources

About