The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching ∼90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.
In this work, a metabolomics dataset from 1 H nuclear magnetic resonance spectroscopy of Brazilian propolis was analyzed using machine learning algorithms, including feature selection and classification methods. Partial least square-discriminant analysis (PLS-DA), random forest (RF), and wrapper methods combining decision trees and rules with evolutionary algorithms (EA) showed to be complementary approaches, allowing to obtain relevant information as to the importance of a given set of features, mostly related to the structural fingerprint of aliphatic and aromatic compounds typically found in propolis, e.g., fatty acids and phenolic compounds. The feature selection and decision tree-based algorithms used appear to be suitable tools for building classification models for the Brazilian propolis metabolomics regarding its geographic origin, with consistency, high accuracy, and avoiding redundant information as to the metabolic signature of relevant compounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.