There has been a considerable stream in ASR over the past few decades, but it may seem strange why this field is still a subject for researchers to work on. There are many reasons, but somewhat because the discipline is created with the promise of human-level performance under pragmatic states and this is an inextricable problem. In addition, the increasing advancement of technology in various fields has caused a more compelling need for this field. Especially the establishment of such a system in the security sector in insecure third world countries such as Afghanistan is an urgent need. This paper began with the reflection of all the necessary knowledge about speech recognition and then suggested an unprecedented method for building an automated speech recognition (ASR) system in the Dari language using the two most powerful open source engines CMUSphinx, from Carnegie Mellon University and DeepSpeech v0.9.3 /. These systems are much more impressive than early speech recognition systems. Using my own collected dataset, a speech-to-text model has been trained for the Dari language. Firstly, the dataset is filtered according to the task, then demonstrated the possible compatibility from the hidden Markov (HMM) models, the phoneme concept to RNN training. The system surpassed previously predicted results, as CMUSphinx stated, "for a typical 10-hour operation, the WER should be around 10%." Finally, 3.3% WER was achieved with 10.3-hours of audio recording using CMUSphinx. 1% WER with DeepSpeech.