Currently, face-swapping deepfake techniques are widely spread, generating a significant number of highly realistic fake videos that threaten the privacy of people and countries. Due to their devastating impacts on the world, distinguishing between real and deepfake videos has become a fundamental issue. This paper presents a new deepfake detection method: you only look once–convolutional neural network–extreme gradient boosting (YOLO-CNN-XGBoost). The YOLO face detector is employed to extract the face area from video frames, while the InceptionResNetV2 CNN is utilized to extract features from these faces. These features are fed into the XGBoost that works as a recognizer on the top level of the CNN network. The proposed method achieves 90.62% of an area under the receiver operating characteristic curve (AUC), 90.73% accuracy, 93.53% specificity, 85.39% sensitivity, 85.39% recall, 87.36% precision, and 86.36% F1-measure on the CelebDF-FaceForencics++ (c23) merged dataset. The experimental study confirms the superiority of the presented method as compared to the state-of-the-art methods.