We develop a new combination strategy to predict retention times of oligonucleotides in ion-pair reversed-phase high performance liquid chromatography. The key step of the strategy is to use score of generalized base properties (SGBP) combined with auto cross covariance (ACC) to resolve the feature representation of nucleic acids. This representation characterize physicochemical, quantumchemical, topological, spatial structural features, etc., and their neighboring effect between bases at a certain distance apart in a sequence. The next step is to use the variables selected by genetic algorithm (GA) to construct prediction models of retention times of oligonucleotides based on support vector machine (SVM). Accordingly, GA-SVM models give different prediction performance using different input descriptors resulting from different step lengths in the ACC transformation, indicating the neighboring effect between bases should not be neglected in the features related to the chromatographic retention of oligonucleotides. As a whole, the GA-SVM predictors obtained from more than 20 training samples can produce satisfying performance in predicting the chromatographic retention of oligonucleotides at a range of temperatures (30°C, 40°C, 50°C, 60°C and 80°C), respectively. The present approach based on the SGBP-ACC-GA-SVM combination shows great application prospect in the field of separation and analysis science, bioinformatics and proteomics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.