The mixture of experts (ME) model is effective for multimodal data in statistics and machine learning. To treat non-stationary probabilistic regression, the mixture of Gaussian processes (MGP) model has been proposed, but it may not perform well in some cases due to the limited ability of each Gaussian process (GP) expert. Although the mixture of Gaussian processes (MGP) and warped Gaussian process (WGP) models are dominant and effective for non-stationary probabilistic regression, they may not be able to handle general non-stationary probabilistic regression in practice. In this paper, we first propose the mixture of warped Gaussian processes (MWGP) model as well as its classification expectation–maximization (CEM) algorithm to address this problem. To overcome the local optimum of the CEM algorithm, we then propose the split and merge CEM (SMC EM) algorithm for MWGP. Experiments were done on synthetic and real-world datasets, which show that our proposed MWGP is more effective than the models used for comparison, and the SMCEM algorithm can solve the local optimum for MWGP.