In metabonomics, metabolic profiles of high complexity bring out tremendous challenges to existing chemometric methods. Variable selection (ie, biomarker discovery) and pattern recognition (ie, classification) are two important tasks of chemometrics in metabonomics, especially biomarker discovery that can be potentially used for disease diagnosis and pathology discovery. Typically, the informative variables are elicited from a single classifier; however, it is often unreliable in practice. To rectify this, in the current study, bagging and classification tree (CT) were combined to form a general framework (ie, BAGCT) for robustly selecting the informative variables, based on the advantages of CT in automatically carrying out variable selection as well as measuring variable importance and the properties of bagging in improving the reliability and robustness of a single model.In BAGCT, a set of parallel CT models were established based on the idea of bagging, each CT providing some endowed information such as the splitting variables and their corresponding importance values. The informative variables can be successfully spied via inspecting the variable importance values over all CTs in BAGCT. Taking the promising properties of support vector machine (SVM) into account, we used the informative variables identified by BAGCT as the inputs of SVM, forming a new classification tool abbreviated as BAGCT-SVM.A metabonomic dataset by hydrogen-1 nuclear magnetic resonance from the patients with lung cancer and the healthy controls was used to validate BAGCT-SVM with CT and SVM as comparisons. Results showed that BAGCT-SVM with less number of variables can give better predictive ability than CT and SVM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.