2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00358
|View full text |Cite
|
Sign up to set email alerts
|

Deep Residual Learning in the JPEG Transform Domain

Abstract: We introduce a general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input. Our formulation leverages the linearity of the JPEG transform to redefine convolution and batch normalization with a tune-able numerical approximation for ReLu. The result is mathematically equivalent to the spatial domain network up to the ReLu approximation accuracy. A formulation for image classification and a model conversion algorit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
42
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 122 publications
(42 citation statements)
references
References 26 publications
0
42
0
Order By: Relevance
“…The resulting networks are 1.77× faster at inference and attain state-of-the-art classification performances. Another research stream designs dedicated networks to spectral input coefficients: harmonic networks [6] uses custom convolutions that produce high-level features by learning combinations of spectral filters defined by the 2D Discrete Cosine Transform; Ehrlich and Davis (2019) [7] introduce a ResNet able to operate on compressed JPEG images by including the compression transform into the network weights. From video side, two recent works on detection in compressed videos are [8], [9].…”
Section: Introductionmentioning
confidence: 99%
“…The resulting networks are 1.77× faster at inference and attain state-of-the-art classification performances. Another research stream designs dedicated networks to spectral input coefficients: harmonic networks [6] uses custom convolutions that produce high-level features by learning combinations of spectral filters defined by the 2D Discrete Cosine Transform; Ehrlich and Davis (2019) [7] introduce a ResNet able to operate on compressed JPEG images by including the compression transform into the network weights. From video side, two recent works on detection in compressed videos are [8], [9].…”
Section: Introductionmentioning
confidence: 99%
“…Generally, when an image is fully focused, the image is clearest and the high-frequency components are maximized. The frequency domain contains rich patterns that are useful for imageunderstanding tasks [28]- [30]. Accordingly, it follows that using frequency-domain information could enhance task performance.…”
Section: B Spatial and Structural Flowsmentioning
confidence: 99%
“…Chen et al [24] pointed out that filtering the transform coefficients is a more direct way to compensate for the quantization loss, and it is helpful to consider the consistency with the human visual system. Studies in [25]- [27] show that it is feasible to use deep neural networks to process DCTdomain coefficients and may even accelerate convergence. Sun et al [28] proposed a DCT-domain convolutional neural network in JPEG to learn the association between the reconstructed image and the original image, which effectively compensates for high-frequency information, thereby protecting the edge of the image.…”
Section: B Quantization Distortion Compensationmentioning
confidence: 99%