In classical game theory, the rational behavior for the player is to make the decision which is approaching Nash equilibrium. The Prisoner's Dilemma, which is a canonical game, is often used to present the rationality. In real experiment in cognitive psychology which were performed by Shafir and Tversky Tversky, 1992a, 1992b), the statistical data show the existence of the irrational behaviors in reality. The phenomenon is called disjunction effect. To explain why it probably happens, we review the Asano-Khrennikov-Ohya model (Asano et al., 2011c;Khrennikov, 2011b) which is the mathematical modeling of the process of decision making in the game of Prisoner's Dilemma. It applies only the mathematical apparatus of quantum mechanics to the decision making process rather than the quantum physical model. In this paper, we present several numerical simulations for the Asano-KhrennikovOhya model together with the graphs of the von Neumann entropy for the solutions. By analyzing the simulation results, we explicitly and numerically present the existence of the irrational behavior for the player which is generated by the Asano-Khrennikov-Ohya model.
Offline reinforcement learning has developed rapidly over the recent years, but estimating the actual performance of offline policies still remains a challenge. We propose a scoring metric for offline policies that highly correlates with actual policy performance and can be directly used for offline policy optimization in a supervised manner. To achieve this, we leverage the contrastive learning framework to design a scoring metric that gives high scores to policies that imitate the actions yielding relatively high returns while avoiding those yielding relatively low returns. Our experiments show that 1) our scoring metric is able to more accurately rank offline policies and 2) the policies optimized using our metric show high performance on various offline reinforcement learning benchmarks. Notably, our algorithm has a much lower network capacity requirement for the policy network compared to other supervised learning-based methods and also does not need any additional networks such as a Q-network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.