The rate of selecting different options in the decisions-from-feedback paradigm is commonly used to measure preferences resulting from experiential learning. While convergence to a single option increases with experience, some variance in choice remains even when options are static and offer fixed rewards. Employing a decisions-from-feedback paradigm followed by a policy-setting task, we examined whether the observed variance in choice is driven by factors related to the paradigm itself: Continued exploration (e.g., believing options are non-stationary) or exploitation of perceived outcome patterns (i.e., a belief that sequential choices are not independent). Across two studies, participants showed variance in their choices, which was related (i.e., proportional) to the policies they set. In addition, in Study 2, participants' reported under-confidence was associated with the amount of choice variance in later choices and policies. These results suggest that variance in choice is better explained by participants lacking confidence in knowing which option is better, rather than methodological artifacts (i.e., exploration or failures to recognize outcome independence). As such, the current studies provide evidence for the decisions-from-feedback paradigm's validity as a behavioral research method for assessing learned preferences.