Chatter has a direct effect on the precision and life of machine tools and its detection is a crucial issue in all metal machining processes. Traditional methods focus on how to extract discriminative features to help identify chatter. In this study, an effective procedure for chatter data preprocessing is proposed that can improve neural network learning results from data of extremely low quantity. Different Computer numerical control (CNC) machines, cutting tools, operation conditions as well as workpiece material, and shapes all generate different dynamic behavior . Therefore, the same cutting conditions are processed in different CNCs, some will produce chatter, and some will not. In order to collect chatter signals of different cutting environments, the cost of materials and time is relatively high. Cutting chatter also leads to tool wear, which also increases the cost of data collection. This makes the use of CNCs for large-scale chatter testing experiments impractical. However, a way of producing accurate chatter test results from rather sparse data is needed. The solution to this practical problem involved an innovative data preprocessing and training strategy combined with a modified convolutional neural network and a deep convolutional generative adversarial net. Through the characteristics of a chaotic attractor, the variability of chatter data can be minimized. Moreover, the characteristics of a chaotic attractor are utilized where the chaotic system, very sensitive to the input, can distinguish data with chatter and without chatter to improve chatter detection and classification. Convolutional neural networks can be effective chatter classifiers, and adversarial networks can act as generators that produce more data. Original training data are collected and preprocessed by the Chen-Lee chaotic error mapping. Experimental results indicated that the generative adversarial network (GAN) model could generate better training data than the traditional data augmentation method. The convolutional neural networks were trained using augmented data produced by the generator network. The adversarial training process used these data to create the generator and the generator could produce enough data to compensate for the lack of training data. The experimental results are compared without a data generator and data augmentation. Using only 60 original data, the proposed method has an accuracy of 95.3% on leave-one-out cross-validation over 10 runs and surpassed other methods and models. The forged data are also compared with original training data as well as data produced by augmentation. The distribution shows that forged data have similar quality and characteristics to the original data. The proposed training strategy provides a high-quality deep learning chatter detection model.