Lipid content is an important indicator of the edible and breeding value of Pinus koraiensis seeds. Difference in origin will affect the lipid content of the inner kernel, and neither can be judged by appearance or morphology. Traditional chemical methods are small-scale, time-consuming, labor-intensive, costly, and laboratory-dependent. In this study, near-infrared (NIR) spectroscopy combined with chemometrics was used to identify the origin and lipid content of P. koraiensis seeds. Principal component analysis (PCA), wavelet transformation (WT), Monte Carlo (MC), and uninformative variable elimination (UVE) methods were used to process spectral data and the prediction models were established with partial least-squares (PLS). Models were evaluated by R2 for calibration and prediction sets, root mean standard error of cross-validation (RMSECV), and root mean square error of prediction (RMSEP). Two dimensions of input data produced a faster and more accurate PLS model. The accuracy of the calibration and prediction sets was 98.75% and 97.50%, respectively. When the Donoho Thresholding wavelet filter ‘bior4.4’ was selected, the WT–MC–UVE–PLS regression model had the best predictions. The R2 for the calibration and prediction sets was 0.9485 and 0.9369, and the RMSECV and RMSEP were 0.0098 and 0.0390, respectively. NIR technology combined with chemometric algorithms can be used to characterize P. koraiensis seeds.