In response to the reported replication crisis in psychology, much recent work focused on increasing the rigor of theory assessment in the social sciences. This research highlights that testing theories is challenging because they inherit a new set of auxiliary assumptions as soon as they are linked to a specific methodology. In this article, we integrate and build on this line of work by demonstrating the breadth of these challenges. In particular, we show that tracking auxiliary assumptions is extremely difficult because they are made at different stages of theory testing and at multiple levels of a theory. We focus on these issues in a reanalysis of a seminal study and its replications, both of which use a simple working-memory paradigm and a mainstream computational modeling approach. These studies provide the main evidence for ‘discrete-slot’ recognition models of visual working-memory, and are still used as the basis for how to measure performance in popular visual working-memory tasks. In our reanalysis, we findthat core practical auxiliary assumptions were unchecked and violated; the original model comparison metrics and data were not diagnostic in several experiments. Furthermore, we find that models were not matched on their ‘theory-general’ auxiliary assumptions, indicating that the tested models were restricted, and not matched in theoretical scope. After testing these auxiliaryassumptions and identifying diagnostic testing conditions, we find evidence for the opposite conclusion. That is, continuous resource models outperform discrete-slot models. Together, our work demonstrates why tracking and testing auxiliary assumptions is a fundamental challenge, even in prominent studies led by careful, computationally-minded researchers. Our work also serves as a framework and conceptual guide for scientists and research consumers on how toidentify and test the gamut of auxiliary assumptions in psychological theory assessment.