Nuclear magnetic resonance (NMR) can provide a large amount of information about an analyzed sample; however, its spectra contain above 6000 variables, making it difficult for random forest (RF) applications. Reducing the size of the original dataset can minimize this problem. In this paper, we compared RF classification models obtained with full NMR spectral range and from the reduction of NMR variables, using principal component analysis (PCA) and the Fisher discriminant (FD). Then, the variables used in the construction of RF trees were analyzed and identified. Here, we used 1 H and 13 C NMR spectra obtained from 126 petroleum samples and values of their total acidy number (TAN), as measured by ASTM D664, ranging from 0.03 to 4.96 mg KOHÁ g −1 , to distinguish the oil samples from the TAN values. Of two classes that resulted, the first contained 78 samples with TAN values less than, or equal to, 0.3 mg KOHÁ g −1 , while the second contained 48 samples with TAN values higher than 0.3 mg KOHÁ g −1. The 1 H NMR results showed that the combination of FD and RF techniques provided the best accuracy (88%). For 13 C NMR data, the most accurate model was obtained by the association of PCA and RF (84%). The identification of variables used in RF allowed a better understanding of the important chemical data contained in the spectra and the relationship to TAN in petroleum. K E Y W O R D S fisher discriminant, NMR, PCA, random forest, reduction of variables 1 | INTRODUCTION Random forest (RF) is a machine learning method recognized for its efficiency in supervised classification of linear and nonlinear data, especially in the areas of genetic analysis, metabolomics, medical image analysis, food quality control, and crude oil. 1-4 The potential of the RF technique in classification is confirmed when associated with analytical methods that allow the acquisition of a large amount of information, such as chromatography, mass spectrometry, infrared spectroscopy, nuclear magnetic resonance of hydrogen (1 H NMR), and carbon (13 C NMR). 1-4 The NMR technique provides detailed information on functional groups, characterizing the sample at the molecular level; however, the number of variables generated may exceed 65 000, depending on the spectral range analyzed. In