Background: Propensity constitutes a common problem in identifying clinical outcome prediction model whose covariates include the treatment option, which is assumed to be randomly assigned but indeed dependent of other covariates in the training data. The genuine effect of treatment option cannot be elucidated under the influence of propensity. Existing approaches, such as matched-pairs study design, still cannot solve the problem for imbalanced or small datasets.Methods: This work proposed an anti-propensity estimate of treatment option, which is generated by support vector classifier based on two synergistic markers that represent the lower and upper limits of intercovariate association level. The algorithm for generating the synergistic markers was illustrated and the performance was evaluated on a public dataset of gene expression levels, which were obtained from surgically excised tumor samples in non-small cell lung cancer (NSCLC) patients where treatment option, i.e., adjuvant therapy or not, was known.Results: Six covariates represented by the expression levels of ZNF217, ERCC3, PMS1, PIK3CB, BARD1 and MAPK1, were selected to generate two synergistic markers and classifier for estimating the adjuvant therapy option with substantially attenuated propensity. The estimation accuracy attained an area under the receiver-operating characteristics curve, 0.78, in the test set.
Conclusions:The proposed synergistic markers demonstrated a parsimonious and anti-propensity estimation of treatment option, which is ready for the further evaluation and application in the clinical outcome prediction model.