A B S T R A C T This paper describesa mel-cepstral analysis method and its adaptive algorithm. In the proposed method, we apply the criterion used in the unbiased estimation of log spectrum to the spectral model represented by the melcepstral coefficients. To solve the non-linear minimization problem involved in the method, we give an iterative algorithm whose convergence is guaranteed. Furthermore, we derive an adaptive algorithm for the mel-cepstral analysis by introducing an instantaneous estimate for gradient of the criterion. The adaptive mel-cepstral analysis system is implemented with an IIR adaptive filter which has an exponential transfer function, and whose stability is guaranteed. We also present examples of speech analysis and results of an isolated word recognition experiment.
I N T R O D U C T I O NThe spectrum represented by the mel-cepstral coefficients have frequency resolution similar to that of the human ear which has high resolution at low frequencies [l]. As a result, mel-cepstral coefficients are useful for speech synthesis and recognition. For obtaining mel-cepstral coefficients, several methods have been proposed. For example, the mel-cepstral coefficients are obtained from the LPC coefficients by using the technique of spectral resampling. No strict method, however, is proposed in which the spectral model is represented by mel-cepstral coefficients and a spectral criterion is minimized.In this paper, we propose a mel-cepstral analysis method and its adaptive algorithm. In the mel-cepstral analysis method, the model spectrum is represented by the M-th order mel-cepstral coefficients and the criterion used in the unbiased estimation of log spectrum[2] is minimized with respect to the mel-cepstral coefficients. The minimization problem is solved efficiently by an iterative technique using the FFT, recursion formulas, and a fast algorithm that requires O ( M Z ) arithmetic operations. We can show that the convergence is quadratic and typically a few iterations are sufficient to obtain the solution.Furthermore, we present an adaptive algorithm for the mel-cepstral analysis. To derive the adaptive algorithm, we introduce an instantaneous estimate for the gradient of the criterion in a similar manner of the LMS algorithm [3].The adaptive analysis system is implemented with an IIR adaptive filter which has the structure of the MLSA filter We show examples of analysis for synthetic and speech signal. To evaluate the proposed methods, an isolated word recognition experiment was carried out.
S P E C T R A L E S T I M A T I O N B A S E D O N M E L -C E P S T R A L R E P R E S E N T A T I O N
The Voice Activity Detection (VAD) problem is placed into a decision theoretic framework, and the Gaussian VAD model of Sohn et al. is then shown to fit well with the framework. It is argued that the Gaussian model can be made more robust to correlation and expected spectral shapes of speech and noise by using a differential spectral representation. Such a model is formulated theoretically. The differential spectral VAD is then shown by experiment to be consistently superior to the basic Gaussian VAD in a speech recognition setting, especially for noisy environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.