Dynamic pricing for preemptible cloud services (DPPCS) is highly demanded to effectively utilize the excess capacity in cloud computing. However, the dynamic nature of excess capacity exhibits high non-stationarity, which is characterized by multi-temporal stochastic patterns with time-varying statistical properties. The non-stationarity results in the DPPCS problem being a Non-Stationary Markov Decision Process (NSMDP) with unknown transition probabilities. Moreover, DPPCS is constrained by a certain maximum preemption rate, further complicating the DPPCS problem as a Constrained NSMDP (CNSMDP). We transform the CNSMDP into a piecewise Lagrangian dual model, which converts the CNSMDP into an unconstrained optimization problem. To solve the above problem, we propose a novel Q-Learning approach for DPPCS. We first present estimation methods for the unknown environment parameters, including a detection method for identifying temporal pattern changes, and a diffusion approximation method for estimating the actual preemption rate. Then, we introduce a Lagrange multiplier updating method, which can strike a balance between revenue and the preemption rate in the reward function. Building upon the above methods, we develop a Constrained Non-Stationary Q-Learning (CNSQL) algorithm for DPPCS, which dynamically adjusts its learning process to adapt to the multi-temporal patterns. Through simulated experiments, we demonstrate the effectiveness of our proposed approach compared to state-of-the-art algorithms. It performs well in improving revenue generated from excess capacity while maintaining the actual preemption rate within the specified constraint.