Non-uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

Motlíček, Petr; Heřmanský, Hynek; Ganapathy, Sriram; Garudadri, Harinath

doi:10.1007/978-3-540-74628-7_46

Cited by 4 publications

(4 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the decoder, the sub-band residuals were reconstructed and modulated with corresponding FDLP envelope. Individual DCT contributions from each critical sub-band were summed and inverse DCT was applied to reconstruct output signal [25].…”

Section: Fdlp For Narrow-band Speech Codingmentioning

confidence: 99%

“…In DPQ, graphically shown in Figure 5, phase spectral components corresponding to relatively low-magnitude spectral components are transmitted with lower resolution, that is, the codebook vector selected from the magnitude codebook is processed by "adaptive thresholding" in the encoder as well as in the decoder [25]. The threshold determines the resolution of quantization levels in uniform SQ.…”

Section: Dynamic Phase Quantization (Dpq)mentioning

confidence: 99%

See 1 more Smart Citation

Wide-Band Audio Coding Based on Frequency-Domain Linear Prediction

Motlíček¹,

Ganapathy²,

Heřmanský³

et al. 2010

EURASIP Journal on Audio, Speech, and Music Processing

View full text Add to dashboard Cite

We revisit an original concept of speech coding in which the signal is separated into the carrier modulated by the signal envelope. A recently developed technique, called frequency-domain linear prediction (FDLP), is applied for the efficient estimation of the envelope. The processing in the temporal domain allows for a straightforward emulation of the forward temporal masking. This, combined with an efficient nonuniform sub-band decomposition and application of noise shaping in spectral domain instead of temporal domain (a technique to suppress artifacts in tonal audio signals), yields a codec that does not rely on the linear speech production model but rather uses well-accepted concept of frequency-selective auditory perception. As such, the codec is not only specific for coding speech but also well suited for coding other important acoustic signals such as music and mixed content. The quality of the proposed codec at 66 kbps is evaluated using objective and subjective quality assessments. The evaluation indicates competitive performance with the MPEG codecs operating at similar bit rates.

show abstract

Section: Fdlp For Narrow-band Speech Codingmentioning

confidence: 99%

Section: Dynamic Phase Quantization (Dpq)mentioning

confidence: 99%

Wide-Band Audio Coding Based on Frequency-Domain Linear Prediction

Motlíček¹,

Ganapathy²,

Heřmanský³

et al. 2010

EURASIP Journal on Audio, Speech, and Music Processing

View full text Add to dashboard Cite

show abstract

“…A new audio coding technique based on modeling the spectral dynamics has been proposed in [1], [2]. The input audio signal is first decomposed into frequency sub-bands using a Quadrature Mirror Filter (QMF) bank.…”

Section: Introductionmentioning

confidence: 99%

Temporal masking for bit-rate reduction in audio codec based on Frequency Domain Linear Prediction

Ganapathy

Motlíček

Heřmanský

et al. 2008

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

Abstract. Audio coding based on Frequency Domain Linear Prediction (FDLP) uses autoregressive model to approximate Hilbert envelopes in frequency sub-bands for relatively long temporal segments. Although the basic technique achieves good quality of the reconstructed signal, there is a need for improving the coding efficiency. In this paper, we present a novel method for the application of temporal masking to reduce the bit-rate in a FDLP based codec. Temporal masking refers to the hearing phenomenon, where the exposure to a sound reduces response to following sounds for a certain period of time (up to 200 ms). In the proposed version of the codec, a first order forward masking model of the human ear is implemented and informal listening experiments using additive white noise are performed to obtain the exact noise masking thresholds. Subsequently, this masking model is employed in encoding the sub-band FDLP carrier signal. Application of the temporal masking in the FDLP codec results in a bit-rate reduction of about 10% without degrading the quality. Performance evaluation is done with Perceptual Evaluation of Audio Quality (PEAQ) scores and with subjective listening tests.

show abstract

“…A new speech/audio coding technique based on modeling the temporal evolution of the spectral dynamics was proposed in [1,2]. The approach is based on representing Amplitude Modulating (AM) signal using Hilbert envelope estimate and Frequency Modulating (FM) signal using Hilbert carrier.…”

Section: Introductionmentioning

confidence: 99%