“…In the front-end processing, the Mel frequency cepstral coefficient (MFCC) is widely used to represent the speech signal [1]. Besides, the perceptual linear predictive (PLP) features [2], spectro-temporal features [3], and cochlear filter cepstral coefficients (CFCC) features [4] have also been successfully used for speech recognition. In the backend classification, the statistical acoustic models are commonly used, such as hidden Markov model (HMM) [5], *Correspondence: wupingping@nau.edu.cn 2 School of Engineering Auditing, Jiangsu Key Laboratory of Public Project Audit, Nanjing Audit University, Nanjing, China Full list of author information is available at the end of the article artificial neural network (ANN) [6], and dynamic Bayesian network (DBN) [7].…”