Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Ezgi, Ozyilkan,; Ulhaq, Mateen; Choi, Hyomin; Racapé, Fabien

doi:10.1109/dcc55655.2023.00012

Cited by 5 publications

(6 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the YOLOv3 model, we test our models on a subset of 5000 images from the COCO2014 [24] dataset, and use the mean average precision (mAP) as measured at 50% intersection over union (IoU), which is denoted as mAP@50, as our accuracy metric. Our results for YOLOv3 are compared with the best published settings of 3 previous scalable codecs [2,3,4], refered to as Choi2022, Harell2022, and Ozyilkan2023. For completeness, we also include two traditional codecs, VVCintra [16] and HEVC-intra [14] (also known as BPG), and the learnable codec of [10] to which we refer to as Cheng2020 3 .…”

Section: Base Layer Resultsmentioning

confidence: 99%

“…For Faster R-CNN, we use the entire COCO2017 [24] validation set (which also contains 5000 images), and report the average mAP over a range of IoU thresholds between 50 − 95% with steps of 5%, which we simply denote mAP 4 . Available benchmarks for comparison here are Choi2022, Cheng2020 and the two traditional codecs VVC and HEVC.…”

Section: Base Layer Resultsmentioning

confidence: 99%

“…Compared with Ozyilkan2023, our model achieves superior performance, especially at lower rates, despite utilizing a simpler analysis transform. Our model's success relative to Ozyilkan2023 is likely due to a combination of the simple entropy model used in [4] and the latter being trained jointly. Comparing the results for Faster R-CNN, shown in Fig.…”

Section: Base Layer Resultsmentioning

confidence: 99%

“…We begin by training two versions of our base-layer corresponding to two popular DNN models for object detection: YOLOv3 [19] and Faster R-CNN [20]. We pick these models because they are well-established, well-understood models and because they were used in previously set benchmarks for scalable coding for humans and machines [2,3,4]. Training is performed using the Lagrangian rate distortion loss shown in Eq.…”

Section: Methodsmentioning

confidence: 99%

“…Whenever a violation is committed, the raw footage may be required as evidence in any legal proceedings. Recent works [2,3,4,5,6], offer a solution to such scenarios that balances the ratedistortion performance of both the CV task and full input reconstruction. In the scalable coding setting, the encoding is performed in layers -in the first, "base-layer," information needed for automated analysis is encoded and subsequently decoded and utilized to perform the CV task; in the next, "enhancement-layer" 1 additional information is used alongside the base-layer encoding to fully reconstruct the image.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Rate-Distortion in Image Coding for Machines

Harell

Andrade

Bajić

2022

2022 Picture Coding Symposium (PCS)

View full text Add to dashboard Cite

Compression for machines is an emerging field, where inputs are encoded while optimizing the performance of downstream automated analysis. In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction. Often performed by jointly optimizing the compression scheme for both machine task and human perception, this results in suboptimal rate-distortion (RD) performance for the machine side. We focus on the case of images, proposing to utilize the pre-existing residual coding capabilities of video codecs such as VVC to create a scalable codec from any image compression for machines (ICM) scheme. Using our approach we improve an existing scalable codec to achieve superior RD performance on the machine task, while remaining competitive for human perception. Moreover, our approach can be trained post-hoc for any given ICM scheme, and without creating a coupling between the quality of the machine analysis and human vision.

show abstract