We study an off-policy contextual pricing problem where the seller has access to samples of prices which customers were previously offered, whether they purchased at that price, and auxiliary features describing the customer and/or item being sold. This is in contrast to the well-studied setting in which samples of the customer's valuation (willingness to pay) are observed. In our setting, the observed data is influenced by the historic pricing policy, and we do not know how customers would have responded to alternative prices. We introduce suitable loss functions for this pricing setting which can be directly optimized to find an effective pricing policy with expected revenue guarantees without the need for estimation of an intermediate demand function. We focus on convex loss functions. This is particularly relevant when linear pricing policies are desired for interpretability reasons, resulting in a tractable convex revenue optimization problem. We further propose generalized hinge and quantile pricing loss functions, which price at a multiplicative factor of the conditional expected value or a particular quantile of the valuation distribution when optimized, despite the valuation data not being observed. We prove expected revenue bounds for these pricing policies respectively when the valuation distribution is log-concave, and provide generalization bounds for the finite sample case.Finally, we conduct simulations on both synthetic and real-world data to demonstrate that this approach is competitive with, and in some settings outperforms, state-of-the-art methods in contextual pricing.