In the last few years, deep neural networks have taken the problem of automated voice recognition to a whole new level of accuracy. Where it provided the highest recognition rates whether on words or on phonemes. Voice recognition problem represents the first phase of automated speech recognition systems. In this research, we introduce the recognition of phonemes based on deep neural networks using the Convolutional Neural Network 'CNN'. We will discuss two approaches of recognition, the direct approach by recognizing the phonemes using a single classification phase by obtaining the correct phonemes directly through the input. The second proposed approach uses several phases of classification by taking into account the types of phonemes and their classes (vowels, semi-vowels, explosive, etc.). In both approaches, we rely on the mel spectrogram transform where the acoustic signal is converted into a two-dimensional matrix within the frequency domain, this matrix is then inserted as the input of the deep neural network. We tested the proposed classifier on TIMIT database, obtained 57% accuracy in the direct approach and a higher accuracy of 61% using our proposed approach.