The acoustic features extracted from the speech-signal are a critical challenge for implementing an accurate speaker identification system. In this paper, two-dimension discrete multi-wavelet transform (2D-DMWT) in conjunction with the deep learning neural networks are proposed for speaker identification. The DMWT is based on a vital sampling scheme preprocessing that uses the filter invented by Geronimo, Hardian, and Massopust, which is call GHM. The system proposed involves firstly preprocessing in which the speech-signal is resampled into 16kHz. Then, the speech-signal is divided to five different durations: 0.5 sec., 1 sec., 2 sec., 3 sec., and 5 sec. In this paper, each duration is tested separately. Second, 2D-DMWT is employed to obtain discriminant features from the speech-signal and reduce speech-signal dimensions in the feature's selection phase. Finally, neural network algorithm based on convolution neural network (CNN) is used for classification. The system proposed is tested using four databases: SALU-AC, ELSDSR, TIMIT, and RAVDESS. These databases include various speech variances, such as age, gender, etc. The results obtained by the proposed system are 95.86%, 96.59%, 89.90%, and 89.83% for 0.5sec of the SALU-AC, ELSDSR, RAVDESS, and TIMIT databases, respectively. For 1sec, the SALU-AC, ELSDSR, RAVDESS, and TIMIT databases obtained 96.30%, 97.31%, 96.05%, and 93.59%, respectively. The SALU-AC, ELSDSR, RAVDESS, and TIMIT databases achieved 96.63%, 97.76%, 96.12%, and 95.90%, respectively, over the 2sec time duration. During the time duration of 3sec, the SALU-AC, ELSDSR, and RAVDESS databases obtained 97.04%, 98%, and 97.96%, respectively. For 5sec, the SALU-AC, and ELSDSR databases attained 97.56%, and 98.30%, respectively. The results accomplished by the proposed system are outperformed those results discussed in the previous works based on the same databases.