When making statistical inferences, bootstrap resampling methods are often appealing because of less stringent assumptions about the distribution of the statistic(s) of interest. However, the procedures are not free of assumptions. This paper addresses a specific situation that occurs frequently in atmospheric sciences where the standard bootstrap is not appropriate; comparative forecast verification of continuous variables. In this setting, the question to be answered concerns which of twoweather or climate models is better in the sense of some type of average deviation from observations. The series to be compared are generally strongly dependent, which invalidates the most basic bootstrap technique. This paper also introduces new bootstrap code from the R package distillery that facilitates easy implementation of appropriate methods for paired-difference-of-means bootstrap procedures for dependent data.