-Speech communication is very essential for humanhuman communication and human machine interaction. Current Automatic Speech Recognition (ASR) may not be suitable for quiet settings like libraries and meetings or for speech handicapped and elderly people. In this study, we present an end-to-end deep learning system for subvocal speech recognition. The proposed system utilizes a single channel surface Electromyogram (sEMG) placed diagonally across the throat alongside a close-talk microphone. The system was tested on a corpus of 20 words. The system was capable of learning the mapping functions from sound and sEMG sequences to letters and then extracting the most probable word formed by these letters. We investigated different input signals and different depth levels for the deep learning model. The proposed system achieved a Word Error Rate (WER) of 9.44, 8.44 and 9.22 for speech, speech combined with single channel sEMG, and speech with two channels of sEMG respectively.