As the fault detection methods diagnose defects in the earlier stage, the subsequent costs will be reduced. Feature extraction from the vibration signal is the foremost step for incipient fault detection of gearboxes. However, the current statistical features in the time and frequency domains cannot diagnose the early or low intensity faults. In this research, for the first time, besides these features, some other features are extracted by combining variational mode decomposition and time synchronous average (VMD-TSA) to overcome the problem. The combinations have occurred in two ways. First, the Intrinsic Mode Functions (IMFs) of the TSA signal are calculated by VMD, and the Amplitude Energy (AE) and Permutation Entropy (PE) of the first four IMFs are computed. Secondly, the IMFs of vibration signals are calculated, and the TSA features are extracted from the most informative IMF. Moreover, 16 features in the time domain, 13 features in the frequency domain, and 9 features by TSA are extracted from the vibration signals. These features are extracted from healthy and four faulty conditions: crack, spalling, chipping, and wear in three different severities. After feature extraction, the Relief-F algorithm selects the informative features, and selected features are utilized for fault detection by a Feed-Forward Neural network (FNN) classifier. In this study, the ability of the VMD-TSA method is compared with others like Empirical Mode Decomposition-TSA (EMD-TSA) and Ensemble Empirical Mode Decomposition-TSA (EEMD-TSA), which shows that the proposed method is more powerful than others in early fault detection. Besides, the classification accuracy of these methods is compared with some other feature selection methods like Laplacian Score (LS), Principal Component Analysis (PCA), and Minimum Redundancy-Maximum Relevance (MRMR). Also, the performance of the FNN classifier is compared with the Support Vector Machine (SVM). As shown in this study, the VMD-TSA features improve the early fault detection in a positive manner. For instance, the classification accuracy of all features without VMD-TSA in crack fault detection is 93.98%. However, by adding VMD-TSA features, the accuracy grows up to 99.48% in the same conditions.