Over the past decades, extensive research has been carried out on various possible implementations of automatic speech recognition (ASR) systems. The most renowned algorithms in the field of ASR are the mel-frequency cepstral coefficients and the hidden Markov models. However, there are also other methods, such as wavelet-based transforms, artificial neural networks and support vector machines, which are becoming more popular. This review article presents a comparative study on different approaches that were proposed for the task of ASR, and which are widely used nowadays. † training time increases linearly with increase in vocabulary size [42] † quantisation error in the discrete representation of speech signals [42] † temporal information is ignored [42] PCA † reduction in the feature vector's size, while retaining much of the significant information [131] † robust [59, 60] † computationally expensive for high-dimensional data [8] LDA † maximises the distance between classes, but minimises the within class distance [132] † robust [133] † sample distribution is assumed a priori to be Gaussian [63] † class samples are assumed to have equal variance [63] Classification technique Advantages Disadvantages HMM † able to model time distribution of speech signals [103] † simple to adapt [68] † capable to model a sequence of discrete or continuous symbols [13] † inputs can be of variable length [40] † based on the assumption that the probability of being in a particular state is dependent only on its preceding state, ignoring any long-term dependencies [82] † emission probabilities are arbitrarily chosen; hence, these might not even represent properly the output probabilities of the corresponding state [82] ANN (in general) † good classifiers [16, 45] † highly adequate for pattern recognition applications [16, 45] † self-organising [16, 45] † self-learning [16, 45] † self-adaptive in new environments [16, 45] † robust [7] † based on ERM; hence, prone to over training a local minima problems [45, 103] MLP † good discriminating ability [2] † unable to model time distribution of speech signals [2] † inputs have to be of fixed length [2] † able to deal with small vocabularies only [2] SOM † no a priori information is required for training a SOM [134] † can easily adapt if a new sample is presented to it [134] † capable of parallel computation [134] † SOM algorithm is not well defined mathematically; hence, values for the network parameters need to be found by trial-and-error [134] † ordered mapping obtained after the training phase may be lost when applied in real environments due to frequent adaptations [134] RBF † simple to implement [135] † Good discriminating ability [135] † robust [135] † online learning ability [135] † shift invariant in time [91] RNN † able to model time distribution of speech signals thanks to the feedback connections [95, 103] † complex training algorithm [94] † training algorithm is highly sensitive to any changes [94] FNN † does not need large amount of samples during the learning process [99] † ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.