This study explores the application of near-infrared spectroscopy (NIRS) and machine learning to accurately determine the geographical origin of Panax notoginseng (P. notoginseng), a critical component in traditional Chinese medicine. Given the complexity of P. notoginseng geographical origin identification, especially in the face of imbalanced datasets, the study systematically evaluates a range of data preprocessing methods, including autocorrelation, data standardization, Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV), Savitzky-Golay (S-G) smoothing, first-order derivative (1D), second-order derivative (2D), and Principal Component Analysis (PCA). Furthermore, it assesses various machine learning models such as Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Support Vector Machine (SVM), Linear Regression (LR), and neural networks in this context. First by assembling and preparing a substantial dataset of NIRS of P. notoginseng from different geographical locations. The dataset's imbalance, reflective of real-world scenarios, necessitates specialized data handling strategies. The study meticulously applies each preprocessing technique to this dataset, followed by the deployment of different machine learning models. This dual approach allows for an in-depth comparison of how each combination influences the accuracy of geographical origin prediction. Findings of the study reveal that specific combinations of data preprocessing methods and machine learning models yield substantial improvements in predicting the geographical origin of P. notoginseng. These combinations are pivotal in addressing the imbalances inherent in the dataset, thereby enhancing the reliability of the predictions. The research contributes significantly to the field by not only providing a solution to the problem of geographical origin prediction in imbalanced datasets but also by laying down a methodological framework that can be adapted for similar challenges in the broader area of herbal medicine research. This study serves as a cornerstone in the intersection of traditional Chinese medicine and modern scientific methods, offering a robust, data-driven approach to ensuring the authenticity and quality of vital medicinal herbs like P. notoginseng. Its implications extend beyond this specific application, providing insights and methodologies that could revolutionize quality control and authentication processes in herbal medicine globally.