Rapid advances in deep learning models have made it easier for public and crackers to generate hyper-realistic deepfake videos in which faces are swapped. Such deepfake videos may constitute a significant threat to the world if they are misused to blackmail public figures and to deceive systems of face recognition. As a result, distinguishing these fake videos from real ones has become fundamental. This paper introduces a new deepfake video detection method. You Only Look Once (YOLO) face detector is used to detect faces from video frames. A proposed hybrid method based on proposing two different feature extraction methods is applied to these faces. The first feature extraction method, a proposed Convolution Neural Network (CNN), is based on the Histogram of Oriented Gradient (HOG) method. The second one is an ameliorated XceptionNet CNN. The two extracted sets of features are merged together and fed as input to a sequence of Gated Recurrent Units (GRUs) to extract the spatial and temporal features and then individuate the authenticity of videos. The proposed method is trained on the CelebDF-FaceForencics++ (c23) dataset and evaluated on the CelebDF test set. The experimental results and analysis confirm the superiority of the suggested method over the state-of-the-art methods.