Perceptual coding of narrowband audio signals at 8 kbit/s

Najafzadeh-Azghandi, H.; Kabal, P.

doi:10.1109/scft.1997.623920

Cited by 7 publications

(6 citation statements)

References 79 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When the radius reaches zero and the neighborhood contains only the winning node itself, then the training stops after reaching equilibrium. The learning rate is given by (15) where is the SMR value expressed in log scale and it is used to compute the learning rate for all components in the th subvector. The term that depends on is in the form of a sigmoidal function, which saturates to 1 if is large and to zero is is small.…”

Section: Perceptually Weighted Btsofmmentioning

confidence: 99%

“…Subjective evaluation showed that the sound quality of the decoded audio of TWIN-VQ exceeds that of the MPEG1 Layer II coder at the same bit rate [14]. Several other reports also showed the advantages of vector quantization in audio coding [15]- [19]. However, none of these methods take psychoacoustic effects into account during codebook design and vector encoding.…”

mentioning

confidence: 99%

See 1 more Smart Citation

A new audio coding scheme using a forward masking model and perceptually weighted vector quantization

Huang

Chiueh

2002

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

This paper presents a new audio coder that includes two techniques to improve the sound quality of the audio coding system. First, a forward masking model is proposed. This model exploits adaptation of the peripheral sensory and neural elements in the auditory system, which is often deemed as the cause of forward masking. In the proposed audio coder, the forward masking is first modeled by a nonlinear analog circuit and then difference equations for finding the solution of this circuit are formulated. The parameters of the circuit are derived from several factors, including time difference between masker and maskee, masker level, masker frequency, and masker duration. Inclusion of this model in the coding process will remove more redundancy inaudible to humans and thus improves coding efficiency. Secondly, we propose a new vector quantization technique, whose codebooks are generated by a perceptually weighted binary-tree self-organizing feature maps (PW-BTSOFM) algorithm. This vector quantization technique adopts a perceptually weighted error criterion to train and select codewords so that the quantization error is kept below the just-noticed distortion (JND) while using the smallest possible codebook, again reducing the required coded bit rate. Experimental objective and subjective sound quality measurements show that the proposed audio coding scheme requires about 30% less bits than the MPEG layer III audio coding standard.Index Terms-Forward masking, perceptually weighted error criterion, vector quantization.

show abstract

Section: Perceptually Weighted Btsofmmentioning

confidence: 99%

mentioning

confidence: 99%

A new audio coding scheme using a forward masking model and perceptually weighted vector quantization

Huang

Chiueh

2002

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

show abstract

“…For any audible spectral component, the error energy due to a phase error must be below the masking threshold as follows. (7) where A and φ are the amplitude and phase of the component, φ is the phase error and mth is the correspond-ing masking threshold. The worst case occurs when the cosine function has the highest rate of variation that is when φ = π 2 .…”

Section: Upper Bound For Phase Errorsmentioning

confidence: 99%

Narrowband perceptual audio coding: enhancements for speech

Najaf-Zadeh¹,

Kabal²

2001

7th European Conference on Speech Communication and Technology (Eurospeech 2001)

View full text Add to dashboard Cite

This paper presents a bi-modal coding paradigm to compress narrowband audio signals at 8 kbit/s. In the general mode, the Enhanced Narrowband Audio Coder (ENPAC) exploits the characteristics of the human hearing system to adaptively code the perceptually important spectral components of the input audio. The other mode is employed to handle audio inputs with a strong harmonic structure. In that mode, the input block is represented by its audible harmonics. The spectral magnitude is modeled by the linear prediction analysis in the time domain. The phase of each harmonic is predicted and the phase residues are quantized using an adaptive bit allocation algorithm. This paper introduces a perceptually-based upper bound for phase errors of spectral components. The ENPAC encoder delivers good quality for narrowband speech and non-speech inputs.

show abstract

“…To accomplish this goal, we have developed a new audio coding structure based on the characteristics of the human hearing system. The proposed coder, which is referred to as the Narrowband Perceptual Audio Coder ( NPAC), provides moderate quality for narrowband (4 kHz bandwidth) audio inputs at bit rates down to 8 kbit/s [33,34,35]. The proposed coder employs a number of different coding techniques which are described in this thesis.…”

Section: Thesis Contributionsmentioning

confidence: 99%

“…We use a modified version of the LBG algorithm [102] with the following perceptuallybased distortion measure based on the audible noise energy to design the codebooks [33]. The same error criterion is used to select the best codewords in encoding the input vectors.…”

Section: Perceptually Trained Vqmentioning

confidence: 99%

Improving perceptual coding of narrowband audio signals at low rates

Najafzadeh-Azghandi

Kabal²

1999

1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)

Self Cite

View full text Add to dashboard Cite

New applications such as Internet broadcast and communications, consumer multimedia products, digital AM broadcast and satellite networks are emerging. Those applications require moderate audio quality without annoying artifacts at bit rates below 16 kbit/s. Although speech coders provide high speech quality at bit rates around 8 kbit/s, they perform poorly when encoding audio signals. In this thesis, we present a novel transform coding paradigm based on the characteristics of the human hearing system. The proposed encoder, i.e., Narrowband Perceptual Audio Coder (NPAC), can accommodate a wide range of narrowband audio inputs without annoying artifacts at bit rates down to 8 kbit/s.NPAC employs a variety of algorithms to remove the perceptually irrelevant parts and statistical redundancies of the input signal. The new algorithms used in NPAC include a perceptual error measure in training the codebooks and selecting the best codewords, perceptually-based bit allocation algorithms and an adaptive predictive scheme to vector quantize the scale factors.The proposed encoder has moderate complexity and delivers good quality for narrowband audio inputs at around 1 bit/sample. Informal subjective tests have been conducted to compare the performance of NPAC with an 8 kbit/s commercially-available audio coder.The tests results show that NPAC performs better for both music and speech inputs. iii RésuméDes nouvelles technologies telles que la diffusion par Internet, la diffusion AM numérique, et les réseaux satellites deviennent de plus en plus populaires et constituent la base de plusieurs nouvelles applications et produits multimédias. La réussite de ces produits sur la marché dépend de la qualité des signaux audio et vidéo ainsi que de la largeur de bande utilisée. Pour le signal audio, il est désirable que le débit soit en bas de 16 kbit/s tout en offrant une qualité acceptable, c'est-à-dire sans de distorsion remarquable.Il està noter que certains codeurs de parole permettent de transmettre le signal de parole au débit de 8 kbit/s avec une très bonne qualité. Toutefois, puisque ces codeurs profitent de la structure particulière de la parole, ils ne peuvent pas offrir la même qualité audio pour d'autres signaux comme la musique.Dans cette thèse, nous présentons une philosophie d'encodage des signaux audio qui tient compte de la structure du système auditif. Le codeur proposé se nomme Codeur Audio Perceptuelà bandeÉtroite (CAPE). CAPE permet d'encoder plusieurs types de signal audioà bandeétroite au débit de 8 kbit/s sans de distorsion remarquable.Plusieurs nouveaux algorithmes sont utilisés dans CAPE afins d'éliminer la redondance statistique ainsi que la partie sans importance perceptuel du signal d'entrée. Parmi les nouveautés de CAPE, il y a une mesure d'erreur perceptuelle qui est utilisée lors de l'entraînement des tableaux de quantification, et pour la sélection du meilleur vecteur de ces tableaux lors de l'encodage. De plus, l'allocation des bits pour les gains du spectre dans différentes bandes de fréque...

show abstract

Perceptual coding of narrowband audio signals at 8 kbit/s

Cited by 7 publications

References 79 publications

A new audio coding scheme using a forward masking model and perceptually weighted vector quantization

A new audio coding scheme using a forward masking model and perceptually weighted vector quantization

Narrowband perceptual audio coding: enhancements for speech

Improving perceptual coding of narrowband audio signals at low rates

Contact Info

Product

Resources

About