Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition

Messaoud, Zaineb Ben; Hamida, Ahmed Ben

doi:10.1007/s10772-011-9119-z

Cited by 14 publications

(10 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At present, the most popular and successful speech recognition systems use Hidden Markov Models (HMM) [23][24][25][26] in the acoustic modeling. HMM is used to train the acoustic models of sixty one phonemes along with a model of silence (sil).…”

Section: Experimental Setup and Discussionmentioning

confidence: 99%

Articulation based admissible wavelet packet feature based on human cochlear frequency response for TIMIT speech recognition

Biswas

Sahu

Bhowmick

et al. 2014

Ain Shams Engineering Journal

View full text Add to dashboard Cite

To deal with non-stationary and quasi-stationary signals, wavelet transform has been used as an effective tool for the time-frequency analysis. In the recent years, wavelet transform has been used extensively for feature extraction in noisy speech recognition. These filters have the benefit of having frequency bands spacing similar to the auditory Equivalent Rectangular Bandwidth (ERB) scale. Central frequencies of ERB are equally distributed with the frequency response of the human cochlea.This paper deals with the speaker-independent Automatic Speech Recognition (ASR) system for continuous speech. This Hidden Markov Model (HMM) based ASR system was developed for English using recordings of four regions taken from TIMIT database. A new set of features were derived using wavelet packet transform's multi-resolution capabilities and having an advantage of ERB filter based on the human cochlea. New set of wavelet features have shown significant improvements in the noisy environment, especially at low SNR values.Ó 2014 Production and hosting by Elsevier B.V. on behalf of Ain Shams University.

show abstract

Section: Experimental Setup and Discussionmentioning

confidence: 99%

Articulation based admissible wavelet packet feature based on human cochlear frequency response for TIMIT speech recognition

Biswas

Sahu

Bhowmick

et al. 2014

Ain Shams Engineering Journal

View full text Add to dashboard Cite

show abstract

“…Next, we encode one of the most popular speech dataset TIMIT (Garofolo, 1993) into a spike-version, Spike-TIMIT. TIMIT dataset consists of richer acoustic-phonetic content than TIDIGITS (Messaoud and Hamida, 2011). It consists of continuous speech utterances, that are useful for the evaluation of speech coding schemes (Besacier et al, 2000), speech enhancement El-Solh et al (2007) or ASR systems (Mohamed et al, 2011;Graves et al, 2013).…”

Section: Spike-tidigits and Spike-timit Databasesmentioning

confidence: 99%

An Efficient and Perceptually Motivated Auditory Neural Encoding and Decoding Algorithm for Spiking Neural Networks

Pan

Chua

et al. 2020

Front. Neurosci.

View full text Add to dashboard Cite

The auditory front-end is an integral part of a spiking neural network (SNN) when performing auditory cognitive tasks. It encodes the temporal dynamic stimulus, such as speech and audio, into an efficient, effective and reconstructable spike pattern to facilitate the subsequent processing. However, most of the auditory front-ends in current studies have not made use of recent findings in psychoacoustics and physiology concerning human listening. In this paper, we propose a neural encoding and decoding scheme that is optimized for audio processing. The neural encoding scheme, that we call Biologically plausible Auditory Encoding (BAE), emulates the functions of the perceptual components of the human auditory system, that include the cochlear filter bank, the inner hair cells, auditory masking effects from psychoacoustic models, and the spike neural encoding by the auditory nerve. We evaluate the perceptual quality of the BAE scheme using PESQ; the performance of the BAE based on sound classification and speech recognition experiments. Finally, we also built and published two spikeversion of speech datasets: the Spike-TIDIGITS and the Spike-TIMIT, for researchers to use and benchmarking of future SNN research.

show abstract

“…Furthermore, three noises such as car, jet and speech from the Noisex-92 database has been added to clean data at different signal-to-noise ratios (SNRs) (clean, 20, 15, 10, 5 and 0 dB). In this experiment, the hidden Markov model (HMM) [21][22][23] is used in the back end as phoneme recogniser.…”

Section: Experimental Frameworkmentioning

confidence: 99%

Admissible wavelet packet sub‐band‐based harmonic energy features for Hindi phoneme recognition

Biswas¹,

Sahu²,

Bhowmick³

et al. 2015

IET signal process.

View full text Add to dashboard Cite

In recent years wavelet packet (WP) transform has been used as an important speech representation tool. WP based acoustic features have found to be more effective than the short time Fourier transform (STFT) based features to capture the information of unvoiced phoneme in continuous speech. But wavelet features fail to carry the same usefulness to represent the voiced phonemes such as vowels, nasals. This paper proposes a new WP sub-band based features by taking care of harmonic information of voiced speech signal. It has been noticed that most of the voiced energy of the speech signal lies in between 250Hz-2000Hz. Thus the proposed technique emphasizes the individual sub-band harmonic energy upto 2 kHz. The speech signal is decomposed into 16 wavelet sub-bands and Harmonic energy features (HEF) are combined with wavelet packet cepstral features (WPCC). More in IET Signal Processing Digital Library

show abstract

Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition

Cited by 14 publications

References 22 publications

Articulation based admissible wavelet packet feature based on human cochlear frequency response for TIMIT speech recognition

Articulation based admissible wavelet packet feature based on human cochlear frequency response for TIMIT speech recognition

An Efficient and Perceptually Motivated Auditory Neural Encoding and Decoding Algorithm for Spiking Neural Networks

Admissible wavelet packet sub‐band‐based harmonic energy features for Hindi phoneme recognition

Contact Info

Product

Resources

About