Unified speech and audio coding scheme for high quality at low bitrates

Neuendorf, Max; Gournay, Philippe; Multrus, Markus; Lecomte, Jérémie; Bessette, B.; Geiger, Ralf; Bayer, Stefan; Fuchs, Guillaume; Hilpert, Johannes; Rettelbach, Nikolaus; Salami, R.; Schuller, Gerald; Lefebvre, Rémi; Grill, Bernhard

doi:10.1109/icassp.2009.4959505

Cited by 34 publications

(22 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In USAC [34], an up-to-date MPEG standardization, MDCT plays an important role [35]. In the USAC encoder, the MDCT coefficients are firstly companded with a power low function before scalar quantization, achieving in effect a non-uniform scalar quantization.…”

Section: Resultsmentioning

confidence: 99%

A memory efficient finite-state source coding algorithm for audio MDCT coefficients

Jiang

Yin

Liu

2014

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

To achieve a better trade-off between the vector dimension and the memory requirements of a vector quantizer (VQ), an entropy-constrained VQ (ECVQ) scheme with finite memory, called finite-state ECVQ (FS-ECVQ), is presented in this paper. The scheme consists of a finite-state VQ (FSVQ) and multiple component ECVQs. By utilizing the FSVQ, the inter-frame dependencies within source sequence can be effectively exploited and no side information needs to be transmitted. By employing the ECVQs, the total memory requirements of the FS-ECVQ can be efficiently decreased while the coding performance is improved. An FS-ECVQ, designed for the modified discrete cosine transform (MDCT) coefficients coding, was implemented and evaluated based on the Unified Speech and Audio Coding (USAC) scheme. Results showed that the FS-ECVQ achieved a reduction of the total memory requirements by about 11.3%, compared with the encoder in USAC final version (FINAL), while maintaining a similar coding performance.

show abstract

Section: Resultsmentioning

confidence: 99%

A memory efficient finite-state source coding algorithm for audio MDCT coefficients

Jiang

Yin

Liu

2014

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

show abstract

“…However, previous research [5], [6] poor coding gain is that the conventional DFT-based TCX is not based on critical sampling, which causes low-frequency resolution and overhead data during the core-coding transitions. Another problem associated with AMR-WB+ TCX is the block artifact, which is caused by the short overlap between TCX frames.…”

Section: Amr-wb+ Tcxmentioning

confidence: 99%

“…On the other hand, HE-AAC does not perform well for speech signals, since it can not use a small bit budget as efficiently as linear predictive (LP) coders when encoding speech [5], [6]. At 16∼20 kbps, the music quality of the AMR-WB+ is significantly worse than that of the HE-AAC v2 [6]. One of the major reasons is overhead information, particularly during the core-coding transitions, due to non-critical sampling with a low-frequency resolution.…”

Section: Introductionmentioning

confidence: 99%

Efficient Windowing Scheme for MDCT-Based TCX in AMR-WB+

Lee¹,

Park²,

Youn³

et al. 2011

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYAlthough the AMR-WB+ coder provides excellent quality for speech signal, its coding model for music signals is not as optimal as the HE-AAC v2. The main causes of the poor quality of the AMR-WB+ TCX are the non-critical sampling and block artifacts. The new TCX windowing scheme proposed in this paper uses an MDCT with a 50% frame overlap, so that the problems of non-critical sampling and blocking artifacts are significantly mitigated. Due to long overlaps, the proposed scheme involves an additional codec delay. It is, however, moderate for audio services. The results of objective and subjective tests indicate that the proposed scheme achieves noticeable quality improvements for music signals over the previous TCX schemes.

show abstract

“…The FD and LPD core modules process music-and speech-like input signals, respectively. The FD/LPD core modules are controlled by a signal classifier, and thus the performance of the USAC system depends heavily on the performance of the signal classifier tool [2], [7]. In this letter, we propose an LPD single-mode USAC system that does not require a signal classifier.…”

Section: Introductionmentioning

confidence: 99%