On the information rate of speech communication

Kuyk, Steven Van; Kleijn, W. Bastiaan; Hendriks, Richard C.

doi:10.1109/icassp.2017.7953233

Cited by 12 publications

(16 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Numerous applications of IB exist in domains such as clustering [ 7 , 8 ], coding theory and quantization [ 9 , 10 , 11 , 12 ], speech and image recognition [ 13 , 14 , 15 , 16 , 17 ], and cognitive science [ 18 ]. Several recent papers have also drawn connections between IB and supervised learning, in particular, classification using neural networks [ 19 , 20 ].…”

Section: Introductionmentioning

confidence: 99%

Nonlinear Information Bottleneck

Kolchinsky

Tracey

Wolpert

2019

Entropy

122

131

View full text Add to dashboard Cite

Information bottleneck [IB] is a technique for extracting information in some 'input' random variable that is relevant for predicting some different 'output' random variable. IB works by encoding the input in a compressed 'bottleneck variable' from which the output can then be accurately decoded. IB can be difficult to compute in practice, and has been mainly developed for two limited cases: (1) discrete random variables with small state spaces, and (2) continuous random variables that are jointly Gaussian distributed (in which case the encoding and decoding maps are linear). We propose a method to perform IB in more general domains. Our approach can be applied to discrete or continuous inputs and outputs, and allows for nonlinear encoding and decoding maps. The method uses a novel upper bound on the IB objective, derived using a non-parametric estimator of mutual information and a variational approximation. We show how to implement the method using neural networks and gradient-based optimization, and demonstrate its performance on the MNIST dataset.

show abstract

Section: Introductionmentioning

confidence: 99%

Nonlinear Information Bottleneck

Kolchinsky

Tracey

Wolpert

2019

Entropy

122

131

View full text Add to dashboard Cite

show abstract

“…The redundancy in rate of existing speech coders can be determined from estimates of the information rate in speech. A recent rate estimate [5] based on comparing signals with the same message is consistent with lexical information rates computed from phoneme statistics [6]. They suggest that the true information rate is less than 100 b/s.…”

Section: Introductionmentioning

confidence: 57%

Wavenet Based Low Rate Speech Coding

Kleijn

Lim

Luebs

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

123

View full text Add to dashboard Cite

Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.

show abstract

“…The representation of speech used in this paper is based on a crude model of the human auditory system and was motivated using information theoretic arguments in [21] and [27]. Let {x i } be a real-valued random process that represents the samples of an acoustic speech signal where i is the sample index and let {x t } be the short-time Fourier transform (STFT) of {x i } where t is the frame index.…”

Section: A the Communication Channelmentioning

confidence: 99%

“…(4) To estimate (3), realisations of M t and Y t are needed. Estimating a realisation of M t requires a chorus of speech signals (see [27]). In typical applications of intelligibility prediction, such a chorus is not available, so instead we use an upper bound on (3).…”

Section: B Information Rate Of the Communication Channelmentioning

confidence: 99%

See 1 more Smart Citation

An Instrumental Intelligibility Metric Based on Information Theory

Kuyk

Kleijn

Hendriks

2018

IEEE Signal Process. Lett.

Self Cite

View full text Add to dashboard Cite

Abstract-We propose a monaural intrusive instrumental intelligibility metric called SIIB (speech intelligibility in bits). SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing information theoretic intelligibility metrics, SIIB accounts for talker variability and statistical dependencies between timefrequency units. Our evaluation shows that relative to state-ofthe-art intelligibility metrics, SIIB is highly correlated with the intelligibility of speech that has been degraded by noise and processed by speech enhancement algorithms.

show abstract

On the information rate of speech communication

Cited by 12 publications

References 19 publications

Nonlinear Information Bottleneck

Nonlinear Information Bottleneck

Wavenet Based Low Rate Speech Coding

An Instrumental Intelligibility Metric Based on Information Theory

Contact Info

Product

Resources

About