Research on best practices in theory assessment highlights that testing theories is challenging because they inherit a new set of assumptions as soon as they are linked to a specific methodology. In this article, we integrate and build on this work by demonstrating the breadth of these challenges. We show that tracking auxiliary assumptions is difficult because they are made at different stages of theory testing and at multiple levels of a theory. We focus on these issues in a reanalysis of a seminal study and its replications, both of which use a simple working-memory paradigm and a mainstream computational modeling approach. These studies provide the main evidence for “all-or-none” recognition models of visual working memory and are still used as the basis for how to measure performance in popular visual working-memory tasks. In our reanalysis, we find that core practical auxiliary assumptions were unchecked and violated; the original model comparison metrics and data were not diagnostic in several experiments. Furthermore, we find that models were not matched on “theory general” auxiliary assumptions, meaning that the set of tested models was restricted, and not matched in theoretical scope. After testing these auxiliary assumptions and identifying diagnostic testing conditions, we find evidence for the opposite conclusion. That is, continuous resource models outperform all-or-none models. Together, our work demonstrates why tracking and testing auxiliary assumptions remains a fundamental challenge, even in prominent studies led by careful, computationally minded researchers. Our work also serves as a conceptual guide on how to identify and test the gamut of auxiliary assumptions in theory assessment, and we discuss these ideas in the context of contemporary approaches to scientific discovery.