Michael Tschannen scite author profile

Deep Neural Networks trained as image auto-encoders have recently emerged as a promising direction for advancing the state-of-the-art in image compression. The key challenge in learning such networks is twofold: To deal with quantization, and to control the trade-off between reconstruction error (distortion) and entropy (rate) of the latent image representation. In this paper, we focus on the latter challenge and propose a new technique to navigate the ratedistortion trade-off for an image compression auto-encoder. The main idea is to directly model the entropy of the latent representation by using a context model: A 3D-CNN which learns a conditional probability model of the latent distribution of the auto-encoder. During training, the auto-encoder makes use of the context model to estimate the entropy of its representation, and the context model is concurrently updated to learn the dependencies between the symbols in the latent representation. Our experiments show that this approach, when measured in MS-SSIM, yields a state-of-theart image compression system based on a simple convolutional auto-encoder.

show abstract

Generative Adversarial Networks for Extreme Learned Image Compression

Agustsson

et al. 2019

View full text Add to dashboard Cite

We present a learned image compression system based on GANs, operating at extremely low bitrates. Our proposed framework combines an encoder, decoder/generator and a multi-scale discriminator, which we train jointly for a generative learned compression objective. The model synthesizes details it cannot afford to store, obtaining visually pleasing results at bitrates where previous methods fail and show strong artifacts. Furthermore, if a semantic label map of the original image is available, our method can fully synthesize unimportant regions in the decoded image such as streets and trees from the label map, proportionally reducing the storage cost. A user study confirms that for low bitrates, our approach is preferred to state-of-the-art methods, even when they use more than double the bits.

show abstract

Practical Full Resolution Learned Lossless Image Compression

et al. 2019

View full text Add to dashboard Cite

We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG2000. At the core of our method is a fully parallelizable hierarchical probabilistic model for adaptive entropy coding which is optimized end-to-end for the compression task. In contrast to recent autoregressive discrete probabilistic models such as PixelCNN, our method i) models the image distribution jointly with learned auxiliary representations instead of exclusively modeling the image distribution in RGB space, and ii) only requires three forward-passes to predict all pixel probabilities instead of one for each pixel. As a result, L3C obtains over two orders of magnitude speedups when sampling compared to the fastest PixelCNN variant (Multiscale-PixelCNN). Furthermore, we find that learning the auxiliary representation is crucial and outperforms predefined auxiliary representations such as an RGB pyramid significantly.

show abstract

Convolutional Recurrent Neural Networks for Electrocardiogram Classification

2017

View full text Add to dashboard Cite

We propose two deep neural network architectures for classification of arbitrary-length electrocardiogram (ECG) recordings and evaluate them on the atrial fibrillation (AF) classification data set provided by the PhysioNet/CinC Challenge 2017. The first architecture is a deep convolutional neural network (CNN) with averagingbased feature aggregation across time. The second architecture combines convolutional layers for feature extraction with long-short term memory (LSTM) layers for temporal aggregation of features. As a key ingredient of our training procedure we introduce a simple data augmentation scheme for ECG data and demonstrate its effectiveness in the AF classification task at hand. The second architecture was found to outperform the first one, obtaining an F 1 score of 82.1% on the hidden challenge testing set. IntroductionWe consider the task of atrial fibrillation (AF) classification from single lead electrocardiogram (ECG) recordings, as proposed by the PhysioNet/CinC Challenge 2017 [1]. AF occurs in 1-2% of the population, with incidence increasing with age, and is associated with significant mortality and morbidity [2]. Unfortunately, existing AF classification methods fail to unlock the potential of automated AF classification as they suffer from poor generalization capabilities incurred by training and/or evaluation on small and/or carefully selected data sets.In this paper, we propose two deep neural network architectures for classification of arbitrary-length ECG recordings and evaluate them on the AF classification data set provided by the PhysioNet/CinC Challenge 2017. The first architecture is a 24-layer convolutional neural network (CNN) with averaging-based feature aggregation across time. The second architecture is a convolutional recurrent neural network (CRNN) that combines a 24-layer CNN with a 3-layer long-short term memory (LSTM) network for temporal aggregation of features. CNNs have the ability to extract features invariant to local spectral and spatial/temporal variations, and have led to many breakthrough results, most prominently in computer vision [3, Chap. 9]. LSTM networks, on the other hand, were shown to effectively capture long term temporal dependencies in time series [3, Chap. 10]. As a key ingredient of our training procedure we introduce a simple yet effective data augmentation scheme for the ECG data at hand.Related work: Our network architectures are loosely inspired by [4][5][6]. More specifically, a CRNN for polyphonic sound detection was proposed in [6]. Here, unlike in AF classification where one has to infer a single label per ECG, the input audio sequence is mapped to sequences labels, inferring the sound events as a function of time. Work [5] employs a CRNN for mental state classification from electroencephalogram (EEG) data. In [4], LSTM networks are used for multilabel classification of diagnoses in electronic health recordings. Shortly before finalizing this work, we became aware of the preprint [7], which proposes a deep CNN architecture for arrh...

show abstract

Self-Supervised Learning of Video-Induced Visual Invariances

Tschannen

Djolonga

Ritter

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.