There is a crisis of measurement in memory research, with major implications for theory and practice. This crisis arises because of a critical complication present when measuring memory using the recognition memory task that dominates the study of working memory and long-term memory (“did you see this item? yes/no” or “did this item change? yes/no”). Such tasks give two measures of performance, the “hit rate” (how often you say you previously saw an item you actually did previously see) and the “false alarm rate” (how often you say you saw something you never saw). Yet what researchers want is one single, integrated measure of memory performance. Integrating the hit and false alarm rate into a single measure, however, requires a complex problem of counterfactual reasoning that depends on the (unknowable) distribution of underlying memory signals: when faced with two people differing in both hit rate and false alarm rate, the question of who had the better memory is really “who would have had more hits if they each had the same number of false alarms”. As a result of this difficulty, different literatures in memory research (e.g., visual working memory, eyewitness identification, picture memory, etc) have settled on a variety of distinct metrics to combine hit rates and false alarm rates (e.g., A’, corrected hit rate, percent correct, d’, diagnosticity ratios, K values, etc.). These metrics make different, contradictory assumptions about the distribution of latent memory signals, and all of their assumptions are frequently incorrect. Despite a large literature on how to properly measure memory performance, spanning decades, real-life decisions are often made using these metrics, even when they subsequently turn out to be wrong when memory is studied with better measures. We suggest that in order for the psychology and neuroscience of memory to become a cumulative, theory-driven science, more attention must be given to measurement issues. We make a concrete suggestion: the default memory task should change from old/new (“did you see this item’?”) to forced-choice (“which of these two items did you see?”). In situations where old/new variants are preferred (e.g., eyewitness identification; theoretical investigations of the nature of memory decisions), receiver operating characteristic (ROC) analysis should always be performed.