Both simulated quantum annealing and physical quantum annealing have shown the emergence of "heavy tails" in their performance as optimizers: The total time needed to solve a set of random input instances is dominated by a small number of very hard instances. Classical simulated annealing, in contrast, does not show such heavy tails. Here we explore the origin of these heavy tails, which appear for inputs with high local degeneracy-large isoenergetic clusters of states in Hamming space. This category includes the low-precision Chimera-structured problems studied in recent benchmarking work comparing the D-Wave Two quantum annealing processor with simulated annealing. On similar inputs designed to suppress local degeneracy, performance of a quantum annealing processor on hard instances improves by orders of magnitude at the 512-qubit scale, while classical performance remains relatively unchanged. Simulations indicate that perturbative crossings are the primary factor contributing to these heavy tails, while sensitivity to Hamiltonian misspecification error plays a less significant role in this particular setting.