Bank bailouts are controversial governmental decisions, putting taxpayers’ money at risk to avoid a domino effect through the network of claims between financial institutions. Yet very few studies address quantitatively the convenience of government investments in failing banks from the taxpayers’ standpoint. We propose a dynamic financial network framework incorporating bailout decisions as a Markov Decision Process and an artificial intelligence technique that learns the optimal bailout actions to minimise the expected taxpayers’ losses. Considering the European global systemically important institutions, we find that bailout decisions become optimal only if the taxpayers’ stakes exceed some critical level, endogenously determined by all financial network’s characteristics. The convenience to intervene increases with the network’s distress, taxpayers’ stakes, bank bilateral credit exposures and crisis duration. Moreover, the government should optimally keep bailing-out banks that received previous investments, creating moral hazard for rescued banks that could increase their risk-taking, reckoning on government intervention.