Surrogate models are used to alleviate the computational burden in engineering tasks, which require the repeated evaluation of computationally demanding models of physical systems, such as the efficient propagation of uncertainties. For models that show a strongly non-linear dependence on their input parameters, standard surrogate techniques, such as polynomial chaos expansion, are not sufficient to obtain an accurate representation of the original model response. Through applying a rational approximation instead, the approximation error can be efficiently reduced for models whose non-linearity is accurately described through a rational function. Specifically, our aim is to approximate complex-valued models. A common approach to obtain the coefficients in the surrogate is to minimize the samplebased error between model and surrogate in the least-square sense. In order to obtain an accurate representation of the original model and to avoid overfitting, the sample set has be two to three times the number of polynomial terms in the expansion. For models that require a high polynomial degree or are high-dimensional in terms of their input parameters, this number often exceeds the affordable computational cost. To overcome this issue, we apply a sparse Bayesian learning approach to the rational approximation. Through a specific prior distribution structure, sparsity is induced in the coefficients of the surrogate model. This results in a significant decrease of the number of model evaluations required for an accurate representation in comparison to a least-squares approach. In the derivation, we make use of the fact that the posterior distribution of the numerator polynomial coefficients, conditional on the denominator polynomials, can be expressed analytically. The denominator polynomial coefficients as well as the hyperparameters of the problem are then determined through a type-II-maximum likelihood approach. The problem is thereby transferred to a regularized minimization problem. We apply a quasi-Newton gradient-descent algorithm in order to find the optimal denominator coefficients and derive the required gradients through application of CR-calculus. The method is applied to the frequency response functions of an algebraic frame structure model as well as that of an orthotropic plate finite element model.