Research Question Different macroprudential stress tests were proposed in the literature, which account for potential contagion effects in financial networks. While these models were useful for building intuition about how shocks may propagate through the system, their capability to accurately predict whether a given bank will default has not been the focus of the literature. It is well-known that different models may yield very different stress testing results. We therefore propose a backtesting framework that assesses the predictive performance of different fire-sale stress test models and allows to choose the most accurate model from a set of alternatives. Contribution We introduce a generalized fire-sale stress test model that captures a wide range of behavioral assumptions with regards to banks' liquidation dynamics under stress. The literature has proposed alternative behavioral assumptions in this regard, all of which are covered by our generalized model. We build a network of common asset holdings using public balance-sheet data for U.S. commercial banks in 2007. We then compare the model predictions with the list of actual defaults that occurred in the U.S. during the years 2008-2010. In order to assess the relative performance of these network models, we also use several alternative benchmarks. Results We identify two asset classes for which the model has predictive power, independently of the assumed liquidation dynamics. We then show how the behavioural assumption yielding the most accurate model depends on the size of the initial shock and on secondary market liquidity. We also identify, for different liquidation dynamics, the optimal number of liquidation rounds. Overall, our analysis shows that properly calibrated macroprudential stress tests can have predictive power superior to alternative benchmarks that do not account for the network of common asset holdings.