Figure A shows the application area of the Applications Automatic Speech Recognition (ASR) system. Adaptation is required for adaptation processes to be carried out according to the Application area. During the Speech Processing phase, which is the first entry point of the ASR system, feature extraction is performed from the audio signal. Individual properties are obtained by different feature extraction techniques. For example, Mel Frequency Cepstral Coefficient (MFCC) is a feature extraction technique commonly used in speech recognition systems. Decoder, one of the other components of ASR, converts the feature vectors obtained by using Acoustic Model (AM) and Language Model (LM) into phoneme sequences. In acoustic modeling, firstly, the posterior probability of the phoneme within a given time signal is calculated. In the artificial neural network-based acoustic model, the posterior probability of phonemes is independent for each window. This independence means that the phonemes in a word are independent of each other. Figure A. Basic architecture of speech recognition system.Purpose: This study presents a literature review on speech recognition and then discusses the recorded signs of progress made in this area of research for different languages. The data sets used in speech recognition systems, feature extraction approaches, speech recognition methods and performance evaluation criteria are examined and the focus is on the development of speech recognition and the difficulties in this field.
Theory and Methods:In this study, literature review (systematic), which is an important component for a scientific article, was carried out. This process was carried out by the combination of different methods. A combination of review approaches is given.Results: According to the information obtained as a result of the research; Computational architectures that can be applied to resistance to the acoustic environment, self-learning in ASR, detection of unknown words, the success of the Turkish ASR at a broad and limited repertoire level, insufficient source status and Automatic Speech Recognition ASR were evaluated. In addition, the future of Turkish ASR was discussed and recommendations were made to overcome the current difficulties for Turkish ASR.
Conclusion:The aim of this study is to examine the current speech recognition methods and approaches and to present the developments in this field in detail. For this reason, approaches, datasets and the difficulties faced by the researchers in their studies in this field are discussed in the scope of the study. The effect of deep learning and classical approaches on ASR was investigated. A road map is provided for researchers to incorporate the detailed information necessary for their work in this field to their own work and to overcome the present challenges.