We demonstrate a Bayesian method that can be used to calibrate computationally expensive 3D RANS (Reynolds Averaged Navier Stokes) models with complex response surfaces. Such calibrations, conditioned on experimental data, can yield turbulence model parameters as probability density functions (PDF), concisely capturing the uncertainty in the parameter estimates. Methods such as Markov chain Monte Carlo (MCMC) estimate the PDF by sampling, with each sample requiring a run of the RANS model. Consequently a quick-running surrogate is used instead to the RANS simulator. The surrogate can be very difficult to design if the model's response i.e., the dependence of the calibration variable (the observable) on the parameter being estimated is complex. We show how the training data used to construct the surrogate can be employed to isolate a promising and physically realistic part of the parameter space, within which the response is well-behaved and easily modeled. We design a classifier, based on treed linear models, to model the "well-behaved region". This classifier serves as a prior in a Bayesian calibration study aimed at estimating 3 k − ε parameters (C µ ,C ε2 ,C ε1) from experimental data of a transonic jet-in-crossflow interaction. The robustness of the calibration is investigated by checking its predictions of variables not included in the calibration data. We also check the limit of applicability of the calibration by testing at off-calibration flow regimes. We find that calibration yield turbulence model parameters which predict the flowfield far better than when the nominal values of the parameters are used. Substantial improvements are still obtained when we use the calibrated RANS model to predict jet-in-crossflow at Mach numbers and jet strengths quite different from those used to generate the experimental (calibration) data. Thus the primary reason for poor predictive skill of RANS, when using nominal values of the turbulence model parameters, was parametric uncertainty, which was rectified by calibration. Post-calibration, the dominant contribution to model inaccuraries are due to the structural errors in RANS.