Ant colony optimization (ACO) algorithms can generate quality solutions to combinatorial optimization problems. However, like many stochastic algorithms, the quality of solutions worsen as problem sizes grow. In an effort to increase performance, we added the variable step size off-policy hill-climbing algorithm called PDWoLF (Policy Dynamics Win or Learn Fast) to several ant colony algorithms: Ant System, Ant Colony System, Elitist-Ant System, Rank-based Ant System, and Max-Min Ant System. Easily integrated into each ACO algorithm, the PDWoLF component maintains a set of policies separate from the ant colony's pheromone. Similar to pheromone but with different update rules, the PDWoLF policies provide a second estimation of solution quality and guide the construction of solutions. Experiments on large traveling salesman problems (TSPs) show that incorporating PDWoLF with the aforementioned ACO algorithms that do not make use of local optimizations produces shorter tours than the ACO algorithms alone.