Humans have the metacognitive ability to assess the accuracy of their decisions via confidence judgments. Several computational models of confidence have been developed but not enough has been done to compare these models, making it difficult to adjudicate between them. Here, we compare 14 popular models of confidence that make various assumptions, such as confidence being derived from postdecisional evidence, from positive (decision-congruent) evidence, from posterior probability computations, or from a separate decision-making system for metacognitive judgments. We fit all models to three large experiments in which subjects completed a basic perceptual task with confidence ratings. In Experiments 1 and 2, the best-fitting model was the lognormal meta noise (LogN) model, which postulates that confidence is selectively corrupted by signal-dependent noise. However, in Experiment 3, the positive evidence (PE) model provided the best fits. We evaluated a new model combining the two consistently best-performing models-LogN and the weighted evidence and visibility (WEV). The resulting model, which we call logWEV, outperformed its individual counterparts and the PE model across all data sets, offering a better, more generalizable explanation for these data. Parameter and model recovery analyses showed mostly good recoverability but with important exceptions carrying implications for our ability to discriminate between models. Finally, we evaluated each model's ability to explain different patterns in the data, which led to additional insight into their performances. These results comprehensively characterize the relative adequacy of current confidence models to fit data from basic perceptual tasks and highlight the most plausible mechanisms underlying confidence generation.
Public Significance StatementSeveral process models have attempted to describe the computations that underlie metacognition in humans. However, due to lack of systematic, widespread comparisons between these models, there is no consensus on what mechanisms best characterize the process of confidence generation. In this study, we tested 14 popular models of metacognition on three large data sets from basic perceptual tasks, using multiple quantitative as well as qualitative metrics. Our results highlight two mechanisms as the most plausible, generalizable features of confidence-the selective corruption of confidence by signal-dependent metacognitive noise and a heuristic strategy that uses stimulus visibility to estimate confidence. Analyzing the qualitative patterns of confidence generated by the models provides additional insights into each model's success or failure. Our results also help to establish a comprehensive framework for model comparisons that can guide future efforts.