There are many methods to diagnose heart disease; the most effective way is to analyze electrocardiogram (ECG) signals. Generally, the automatic classification techniques based on ECG analysis consist of three steps: data preprocessing, feature extraction, and classification. This study designed eight hybrid model architectures using several types of deep neural networks, including Convolution Neural Network (CNN), Gated Recurrent Unit (GRU), and Bidirectional GRU (Bi-GRU), four of them without Fast Fourier Transform (FFT) and the rest using FFT. Firstly, the MIT-BIH arrhythmia database is cleaned using the wavelet (WT) thresholding method that separates the combined noise and signal frequencies, making it ideal for processing nonstationary ECG signals. Additionally, the imbalance problem in this database was addressed using the synthetic minority over-sampling technique (SMOTE), which is more suitable for medical data than random synthesis methods. Secondly, hybrid models FFT-CNN, FFT-GRU, FFT-CNN-GRU, and FFT-CNN-Bi-GRU are constructed using the new proposed architecture by concatenating resultant features from two paths, the first path using ECG in the time domain and the second path using the resultant spectrum of ECG from FFT as input. A comparative study of the performance of all models was created in terms of accuracy, training time, number of trainable parameters, and robustness against noise. The results show that the proposed CNN, GRU, CNN-GRU, and CNN-Bi-GRU models without WT and FFT achieved 90%, 93%, 95%, and 96% accuracies, while the proposed FFT-CNN, FFT-GRU, FFT-CNN-GRU, and FFT-CNN-Bi-GRU models achieved 97%, 95%, 96%, and 97% accuracies with WT. So, the proposed FFT-CNN model was the best, with less training time and parameters than other models, which significantly impacts designing a high-efficiency model with less complexity for a practical medical diagnosis system. On the other hand, using FFT improved all models' performance, accuracy and robustness against noise.