A treatment regime at a single decision point is a rule that assigns a treatment, among the available options, to a patient based on the patient’s baseline characteristics. The value of a treatment regime is the average outcome of a population of patients if they were all treated in accordance to the treatment regime, where large values are desirable. The optimal treatment regime is a regime which results in the greatest value. Typically, the optimal treatment regime is estimated by positing a regression relationship for the outcome of interest as a function of treatment and baseline characteristics. However, this can lead to suboptimal treatment regimes when the regression model is misspecified. We instead consider value search estimators for the optimal treatment regime where we directly estimate the value for any treatment regime and then maximize this estimator over a class of regimes. For many studies the primary outcome of interest is survival time which is often censored. We derive a locally efficient, doubly robust, augmented inverse probability weighted complete case estimator for the value function with censored survival data and study the large sample properties of this estimator. The optimization is realized from a weighted classification perspective that allows us to use available off the shelf software. In some studies one treatment may have greater toxicity or side effects, thus we also consider estimating a quality adjusted optimal treatment regime that allows a patient to trade some additional risk of death in order to avoid the more invasive treatment.