This paper proposes numerical strategies to robustly and efficiently propagate probability boxes through expensive black box models. An interval is obtained for the system failure probability, with a confidence level. The three proposed algorithms are sampling based, and so can be easily parallelised, and make no assumptions about the functional form of the model. In the first two algorithms, the performance function is modelled as a function with unknown noise structure in the aleatory space and supplemented by a modified performance function. In the third algorithm, an Interval Predictor Model is constructed and a re-weighting strategy used to find bounds on the probability of failure. Numerical examples are presented to show the applicability of the approach. The proposed method is flexible and can account for epistemic uncertainty contained inside the limit state function. This is a feature which, to the best of the authors' knowledge, no existing methods of this type can deal with.