Abstract-A new method is introduced for parametric modeling of spectral envelopes when only a discrete set of spectral points is given. This method, which we call discrete all-pole (DAP) modeling, uses a discrete version of the Itakura-Saito distortion measure as its error criterion. One result is a new autocorrelation matching condition that overcomes the limitations of linear prediction and produces better fitting spectral envelopes for spectra that are representable by a relatively small discrete set of values, such as in voiced speech.We present an iterative algorithm for DAP modeling that is shown to converge to a unique global minimum. We also present results of applying DAP modeling to real and synthetic speech. DAP modeling is extended to allow frequency-dependent weighting of the error measure, so that spectral accuracy can be enhanced in certain frequency regions relative to others.
It has been shown recently that neural nets, when trained using the least squares error criterion with a desired output of 1 for belonging to a class and 0 otherwise, produce as their output an estimate of the posterior probability of the class given the input. In this paper, we introduce a new error criterion for training which improves the performance of neural nets as posterior probability estimators, when compared to using least squares. The new criterion is similar to the Kullback-Leibler information measure and is simple to use. We describe a straightforward iterative algorithm for the minimization of the new error criterion, which has been shown to have good convergence properties. Experimental results comparing least squares with the new criterion clearly demonstrate the superiority of the latter for posterior probability estimation.
The role of transient speech components on speech intelligibility was investigated. Speech was decomposed into two components--quasi-steady-state (QSS) and transient--using a set of time-varying filters whose center frequencies and bandwidths were controlled to identify the strongest formant components in speech. The relative energy and intelligibility of the QSS and transient components were compared to original speech. Most of the speech energy was in the QSS component, but this component had low intelligibility. The transient component had much lower energy but was almost as intelligible as the original speech, suggesting that the transient component included speech elements important to speech perception. A modified version of speech was produced by amplifying the transient component and recombining it with the original speech. The intelligibility of the modified speech in background noise was compared to that of the original speech, using a psychoacoustic procedure based on the modified rhyme protocol. Word recognition rates for the modified speech were significantly higher at low signal-to-noise ratios (SNRs), with minimal effect on intelligibility at higher SNRs. These results suggest that amplification of transient information may improve the intelligibility of speech in noise and that this improvement is more effective in severe noise conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.