Deep Residual Learning in the JPEG Transform Domain

Ehrlich, Max; Davis, Larry S.

doi:10.1109/iccv.2019.00358

Cited by 122 publications

(42 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The resulting networks are 1.77× faster at inference and attain state-of-the-art classification performances. Another research stream designs dedicated networks to spectral input coefficients: harmonic networks [6] uses custom convolutions that produce high-level features by learning combinations of spectral filters defined by the 2D Discrete Cosine Transform; Ehrlich and Davis (2019) [7] introduce a ResNet able to operate on compressed JPEG images by including the compression transform into the network weights. From video side, two recent works on detection in compressed videos are [8], [9].…”

Section: Introductionmentioning

confidence: 99%

Fast object detection in compressed JPEG Images

Deguerre

Chatelain

Gasso

2019

2019 IEEE Intelligent Transportation Systems Conference (ITSC)

View full text Add to dashboard Cite

Object detection in still images has drawn a lot of attention over past few years, and with the advent of Deep Learning impressive performances have been achieved with numerous industrial applications. Most of these deep learning models rely on RGB images to localize and identify objects in the image. However in some application scenarii, images are compressed either for storage savings or fast transmission. Therefore a time consuming image decompression step is compulsory in order to apply the aforementioned deep models. To alleviate this drawback, we propose a fast deep architecture for object detection in JPEG images, one of the most widespread compression format. We train a neural network to detect objects based on the blockwise DCT (discrete cosine transform) coefficients issued from the JPEG compression algorithm. We modify the well-known Single Shot multibox Detector (SSD) by replacing its first layers with one convolutional layer dedicated to process the DCT inputs. Experimental evaluations on PASCAL VOC and industrial dataset comprising images of road traffic surveillance show that the model is about 2× faster than regular SSD with promising detection performances. To the best of our knowledge, this paper is the first to address detection in compressed JPEG images.

show abstract

Section: Introductionmentioning

confidence: 99%

Fast object detection in compressed JPEG Images

Deguerre

Chatelain

Gasso

2019

2019 IEEE Intelligent Transportation Systems Conference (ITSC)

View full text Add to dashboard Cite

show abstract

“…Generally, when an image is fully focused, the image is clearest and the high-frequency components are maximized. The frequency domain contains rich patterns that are useful for imageunderstanding tasks [28]- [30]. Accordingly, it follows that using frequency-domain information could enhance task performance.…”

Section: B Spatial and Structural Flowsmentioning

confidence: 99%

MGCAN: A Novel Method for Visual Power Monitoring Systems

et al. 2021

View full text Add to dashboard Cite

Imaging quality judgement provides useful meaning to benefit intelligent visual power systems. However, accurate identification of possible failures of the monitoring equipment remains challenging. This paper proposes a novel deep architecture to improve the assessment of abnormalities by considering a novel mutual-guided convolutional attention network (MGCAN) and a multi-region scheme. More specifically, the MGCAN exploits various adequate, low-level information to generate spatial and structural flows. Accordingly, it further extends attention mechanisms to model both inter-flow correlations and inter-channel relationships in the network design. The multi-region scheme features a spatial pyramid random-crop strategy and a region-fusion strategy to handle locally non-uniform characteristics among categories. In this way, the whole architecture provides an end-to-end and adaptive learning procedure relevant to quality perception that focuses on learning important features and mining discriminative regions. Experimental results demonstrate its superiority to prior methods for judging abnormalities. The proposed method can be easily extended to an entire surveillance system.INDEX TERMS Abnormal judgement, power systems, deep learning, attention networks, region fusion.

show abstract

“…Chen et al [24] pointed out that filtering the transform coefficients is a more direct way to compensate for the quantization loss, and it is helpful to consider the consistency with the human visual system. Studies in [25]- [27] show that it is feasible to use deep neural networks to process DCTdomain coefficients and may even accelerate convergence. Sun et al [28] proposed a DCT-domain convolutional neural network in JPEG to learn the association between the reconstructed image and the original image, which effectively compensates for high-frequency information, thereby protecting the edge of the image.…”

Section: B Quantization Distortion Compensationmentioning

confidence: 99%

Context-Adaptive Inverse Quantization for Inter-Frame Coding

Liu

et al. 2021

IEEE Open J. Circuits Syst.

View full text Add to dashboard Cite

Deep Residual Learning in the JPEG Transform Domain

Cited by 122 publications

References 26 publications

Fast object detection in compressed JPEG Images

Fast object detection in compressed JPEG Images

MGCAN: A Novel Method for Visual Power Monitoring Systems

Context-Adaptive Inverse Quantization for Inter-Frame Coding

Contact Info

Product

Resources

About