“…In addition, a trained detector is likely based on the specific features of one dataset, and cannot extrapolate to other datasets, i.e., it lacks the ability to generalize. To more effectively detect DeepFakes, some recent work has meticulously designed DNNs to combine modules or features with positive detection capabilities, such as an attention mechanism [16,17,18], texture features [19,20], audio and visual modalities [21], and frequency spectrum [22]. Concomitantly, to drive large and complex DNNs with large-scale datasets requires significant computing resources (e.g., GPUs) and training (e.g., parameter adjustment), which decrease efficiency.…”