Intelligent Genetic Fuzzy Inference System for Speech Recognition: An Approach from Low Order Feature Based on Discrete Cosine Transform

Silva, Washington; Serra, Ginalber Luiz de Oliveira

doi:10.1007/s40313-014-0148-0

Cited by 8 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The speech signal processing aims to efficiently and accurately transform the acoustic speech signal for use in automatic systems. The extensive development of speech processing research demonstrates the effort to improve the performance of speech recognition systems for practical applications (Bellegarda and Monz 2016;Silva and Serra 2014). The use of such systems allows autonomy in areas as telephony, in which service requests are directed by voice commands (Cardoso et al 2010); in automotive engineering, by driving devices inside the cars (Qian et al 2009;Hua and Ng 2010;Li et al 2013); in computer systems, through computer utility programs, in addition to robotic application (Koo et al 2014) and in residential and hospital automation for accessibility of people with locomotive and visual disabilities (Gnanasekar et al 2012;Singh and Yadav 2015).…”

Section: Motivation and Justificationmentioning

confidence: 99%

“…Finally, the representative coefficients of each pattern were encoded in a two-dimensional time matrix by the application of the discrete cosine transform (DCT). The DCT two-dimensional time matrices C jm kn encode the patterns of the speech commands, reproducing the local and global variations of the spectral envelope of the signals as well as presenting the local and global variations of the signal in the time domain (Silva and Serra 2014;Cao et al 2015).…”

Section: Multilevel Hierarchy Speech Pattern Recognition System Methomentioning

confidence: 99%

See 1 more Smart Citation

Hierarchical Expert Neural Network System for Speech Recognition

Rocha

Silva

Barros

2019

J Control Autom Electr Syst

Self Cite

View full text Add to dashboard Cite

This work proposes a hierarchical architecture composed of a expert neural network set based on the ensemble method with dynamic selection of classifiers for application in speech recognition systems. Therefore, 30 commands in the Brazilian Portuguese language were coded by a two-dimensional time matrix, resulting from the application of the discrete cosine transformation in the mel-cepstral coefficients. These patterns were modified by means of a nonlinear transformation to a high-dimensionality space through a set of Gaussian radial basis functions (GRBFs) parameterized with the centroid and covariance characteristics of the classes. The classification was made through the dynamic classifier selection approach, in which multilayer perceptron and learning vector quantization configurations were analyzed to constitute the multiple classifiers specialized in the subdivisions made in the total of classes to be recognized. Then, given a new test pattern, the GRBF that presents the highest value of the receptive field in relation to the input feature vector indicates the class to which the pattern is nearer, thus directing to the expert neural network that provides the final result of classification based on the local accuracy.

show abstract

Section: Motivation and Justificationmentioning

confidence: 99%

Section: Multilevel Hierarchy Speech Pattern Recognition System Methomentioning

confidence: 99%

Hierarchical Expert Neural Network System for Speech Recognition

Rocha

Silva

Barros

2019

J Control Autom Electr Syst

Self Cite

View full text Add to dashboard Cite

show abstract

“…This difficulty increases when the number of estimates in a multiclass problem must be defined simultaneously with high accuracy, since the boundaries among different classes may not be well defined. Thus, new methodologies are proposed to obtain more robust results in multiclass tasks [7,18].…”

Section: Multiclass Learningmentioning

confidence: 99%

“…For the structure of the LVQ neural network, it was necessary to define the η learning rate and the n number of neurons of the competitive layer. The defined values in η set are often used in the specialized literature [17,18,26] and the n set was specified considering that the number of neurons in hidden layer should be greater than the number of inputs and greater than the number of neural network outputs. Because the vectors C N Jm , where N = {4, 9, 16} are mapped into a 30-dimensional space, the input of 15 LVQ experts is a set with 30 source nodes.…”

Section: Lvq Expertsmentioning

confidence: 99%

Neural Network Configurations Analysis for Multilevel Speech Pattern Recognition System with Mixture of Experts

Silva¹,

Rocha²,

Filho³

2018

Intelligent System

View full text Add to dashboard Cite

This chapter proposes to analyze two configurations of neural networks to compose the expert set in the development of a multilevel speech signal pattern recognition system of 30 commands in the Brazilian Portuguese language. Then, multilayer perceptron (MLP) and learning vector quantization (LVQ) networks have their performances verified during the training, validation and test stages in the speech signal recognition, whose patterns are given by two-dimensional time matrices, result from mel-cepstral coefficients coding by the discrete cosine transform (DCT). In order to avoid the pattern separability problem, the patterns are modified by a nonlinear transformation to a high-dimensional space through a suitable set of Gaussian radial base functions (GRBF). The performance of MLP and LVQ experts is improved and configurations are trained with few examples of each modified pattern. Several combinations were performed for the neural network topologies and algorithms previously established to determine the network structures with the best hit and generalization results.

show abstract

“…The term frame is used to determine the length of time between successive calculations of parameters. For speech processing, normally, the time frame is between 10ms and 30ms [14], [15].…”

Section: Pre-processing Of Speech Signalmentioning

confidence: 99%

Application of support vector machines and two dimensional discrete cosine transform in speech automatic recognition

Batista

Silva

2015

2015 SAI Intelligent Systems Conference (IntelliSys)

View full text Add to dashboard Cite

This paper proposes the implementation of a Support Vector Machine (SVM) for automatic recognition of numerical speech commands. Besides the pre-processing of the speech signal with mel-ceptral coefficients, is used to Discrete Cosine Transform (DCT) to generate a two-dimensional matrix used as input to SVM algorithm for generating the pattern of words to be recognized. The Support Vector Machines represent a new approach to pattern classification. SVM is used to recognize speech patterns from the mean and variance of the speech signal input through the two-dimensional array aforementioned, the algorithm trains and tests those data showing the best response. Finally shows the experimental results in speech recognition applied to Brazilian Portuguese language process.

show abstract

Intelligent Genetic Fuzzy Inference System for Speech Recognition: An Approach from Low Order Feature Based on Discrete Cosine Transform

Cited by 8 publications

References 17 publications

Hierarchical Expert Neural Network System for Speech Recognition

Hierarchical Expert Neural Network System for Speech Recognition

Neural Network Configurations Analysis for Multilevel Speech Pattern Recognition System with Mixture of Experts

Application of support vector machines and two dimensional discrete cosine transform in speech automatic recognition

Contact Info

Product

Resources

About