SummaryOutput gap revisions can be large even after many years. Real‐time reliability tests might therefore be sensitive to the choice of the final output gap vintage that the real‐time estimates are compared to. This is the case for the Federal Reserve's output gap. When accounting for revisions in response to the global financial crisis in the final output gap, the improvement in real‐time reliability since the mid‐1990s is much smaller than found by Edge and Rudd (Review of Economics and Statistics, 2016, 98(4), 785–791). The negative bias of real‐time estimates from the 1980s has disappeared, but the size of revisions continues to be as large as the output gap itself. We systematically analyse how the real‐time reliability assessment is affected through varying the final output gap vintage. We find that the largest changes are caused by output gap revisions after recessions. Economists revise their models in response to such events, leading to economically important revisions for not only the most recent years but also reaching back up to two decades. This might improve the understanding of past business cycle dynamics but decreases the reliability of real‐time output gaps ex post.