One of the best-known demonstrations of long-term learning through repetition is the Hebb effect: Immediate recall of a memory list repeated amidst non-repeated lists improves steadily with repetitions. However, previous studies often failed to observe this effect for visuo-spatial arrays. Souza and Oberauer (2022) showed that the strongest determinant for producing learning was the difficulty of the test: Learning was consistently observed when participants recalled all items of a visuo-spatial array (difficult test) but not if only one item was recalled, or recognition procedures were used (less difficult tests). This suggests that long-term learning was promoted by increased testing demands over the short-term. Alternatively, it is possible that lower testing demands still lead to learning but prevented the application of what was learned. In four preregistered experiments (N = 981), we ruled out this alternative explanation: Changing the type of memory test mid-way through the experiment from less demanding (i.e., single item recall or recognition) to a more demanding test (i.e., full item recall) did not reveal hidden learning, and changing it from the more demanding to a less demanding test did not conceal learning. Mixing high and low demanding tests for non-repeated arrays, however, eventually produced Hebb learning even for the less demanding testing conditions. We propose that testing affects long-term learning in two ways: Expectations of the test difficulty influence how information is encoded into memory, and retrieval consolidates this information in memory.