“…In addition, it has been stated that the split of a dataset should be done in order to guarantee similar structure-property relationships (Marrero-Ponce et al, 2018;Martin et al, 2012;Rojas et al, 2015a), in such a way that the space defined by the VOCs of the training set should be representative of the validation and test set molecules for cross-validation and prediction purposes, respectively. Thus, we used the Balanced Subsets Method (BSM) (Rojas et al, 2015a) based on the k-means cluster analysis (k-MCA) for the partition of the dataset. This procedure was applied elsewhere for studying the retention index property of several VOCs in stationary phases of different polarities (Rojas et al, 2015a(Rojas et al, , 2015b(Rojas et al, , 2017(Rojas et al, , 2018.…”