2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) 2020
DOI: 10.1109/mmsp48831.2020.9287136
|View full text |Cite
|
Sign up to set email alerts
|

Video Coding for Machines with Feature-Based Rate-Distortion Optimization

Abstract: Common state-of-the-art video codecs are optimized to deliver a low bitrate by providing a certain quality for the final human observer, which is achieved by rate-distortion optimization (RDO). But, with the steady improvement of neural networks solving computer vision tasks, more and more multimedia data is not observed by humans anymore, but directly analyzed by neural networks. In this paper, we propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance, when t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
24
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(25 citation statements)
references
References 14 publications
1
24
0
Order By: Relevance
“…As shown in Table 1, our method on average saves 37.87% bitrate for the same task performance level on object detection and 32.90% on instance segmentation, in comparison with the Pareto front of VVC anchors. As a reference, although not directly comparable, for instance segmentation using the same task-NN architecture (Mask R-CNN) but trained on CityScapes train set, [3] reports up to 9.95% of bitrate saving for the following QPs: 12, 17, 22, 27, on the CityScapes val set. Our system contains only 1.5M parameters, and is also extremely fast: the average encoding time for a 2048 × 1024 image in the val set is around 0.15 seconds, with batch size of 1 on a single RTX 2080Ti GPU 2 .…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…As shown in Table 1, our method on average saves 37.87% bitrate for the same task performance level on object detection and 32.90% on instance segmentation, in comparison with the Pareto front of VVC anchors. As a reference, although not directly comparable, for instance segmentation using the same task-NN architecture (Mask R-CNN) but trained on CityScapes train set, [3] reports up to 9.95% of bitrate saving for the following QPs: 12, 17, 22, 27, on the CityScapes val set. Our system contains only 1.5M parameters, and is also extremely fast: the average encoding time for a 2048 × 1024 image in the val set is around 0.15 seconds, with batch size of 1 on a single RTX 2080Ti GPU 2 .…”
Section: Resultsmentioning
confidence: 99%
“…In order to directly improve the task performance, [3] and [4] propose standard-compliant methods that preserve the standard coded bitstream format by fine-tuning specific parts of the traditional codec for the targeted machines. Although the above methods manage to improve the task performance, they do not aim to completely replace the conventional pipeline for human-oriented coding, instead they only add an incremental capability to the system.…”
Section: Introductionmentioning
confidence: 99%
“…Another emerging topic in the video coding space is known as video coding for machines (VCM) targeting nonhuman users in the context of machine vision. Duan et al [43] provide a comprehensive introduction to this topic, while Fischer et al [44] specifically address feature-based rate-distortion optimization in the context of VCM. Finally, MPEG has been recently started working on the coded representation of haptics that enable efficient representation and compression of time-dependent haptic signals and are suitable for the coding 13 Accessed: April 16, 2021.…”
Section: B Mpegmentioning
confidence: 99%
“…In response to this emerging challenge, many studies have been actively conducted to explore alternative coding solutions for the new use cases. There exist mainly two categories of solutions: adapting the traditional image and video codecs for machine-consumption [4,5], and employing end-to-end learned codecs that are optimized directly for machines by taking advantage of the neural network (NN) based solutions [6,7]. Each approach has its own pros and cons.…”
Section: Introductionmentioning
confidence: 99%
“…Each approach has its own pros and cons. Traditional video codec-based solutions, built upon mature technologies and broadly adopted standards, are often compatible with existing systems [4,5]. However, it is difficult to optimize the overall performance of a system that consists of a traditional video codec and neural networks that perform machine tasks [6].…”
Section: Introductionmentioning
confidence: 99%