Heteroscedastic regression which considers varying noises across input domain has many applications in fields like machine learning and statistics. Here we focus on the heteroscedastic Gaussian process (HGP) regression which integrates the latent function and the noise together in a unified nonparametric Bayesian framework. Though showing remarkable performance, HGP suffers from the cubic time complexity, which strictly limits its application to big data. To improve the scalability of HGP, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale datasets. Furthermore, two variants are developed to further improve the scalability and capability of VSHGP. The first is stochastic VSHGP (SVSHGP) which derives a relaxed evidence lower bound factorized over data points, thus enhancing efficient stochastic variational inference. The second is distributed VSHGP (DVSHGP) which (i) follows the Bayesian committee machine formalism to distribute computations over multiple local VSHGP experts with many inducing points; and (ii) adopts hybrid parameters for experts to guard against over-fitting and capture local variety. Superiority of DVSHGP and SVSHGP as compared to existing scalable heteroscedastic/homoscedastic GPs is then verified using a synthetic dataset and four real-world datasets.