An agent choosing between various actions tends to take the one with the lowest cost. But this choice is arguably too rigid (not adaptive) to be useful in complex situations, e.g., where explorationexploitation trade-off is relevant in creative task solving or when stated preferences differ from revealed ones. Here we study an agent who is willing to sacrifice a fixed amount of expected utility for adaptation. How can/ought our agent choose an optimal (in a technical sense) mixed action? We explore consequences of making this choice via entropy minimization, which is argued to be a specific example of risk-aversion. This recovers the ǫ-greedy probabilities known in reinforcement learning. We show that the entropy minimization leads to rudimentary forms of intelligent behavior: (i) the agent assigns a non-negligible probability to costly events; but (ii) chooses with a sizable probability the action related to less cost (lesser of two evils) when confronted with two actions with comparable costs; (iii) the agent is subject to effects similar to cognitive dissonance and frustration. Neither of these features are shown by entropy maximization. See section II for more details. We stress that we do not mean the delayed reward situation, where the utility is constant, but is discounted by some known factor, because the action is performed now, while its reward will come in future.How to assign prior probabilities to avoid the strictly deterministic (1)? Such probabilities should hold a natural constraint that actions related to higher cost are getting smaller probabilities. Two ad hoc solutions are especially simple: one can take into account only the second-best action, or take all non-best actions with the same (small) probability. In reinforcement learning the latter prior probability is known as the ǫ-greedy [3]. It is preferable to have a regular method of choosing non-deterministic probabilities, which will reflect people's attitudes towards the decision making in an uncertain situation, and which will include the above ad hoc solutions as particular cases.Here we explore the possibility of defining the prior probabilities via risk minimization (or maximization); see [9, 10] for reviews on the notion of risk and its various interpretations. We assume that the agent first decides how much average utility E−min k [ε k ] he invests into exploration by going into nonoptimal-in the sense of not holding (1)-behavior. We employ the notion of risk in a specific context, namely when comparing the behavior of agents having the same utilities for various actions and the same value of E. We argue below that maximizing (minimizing) risk in this specific situation can be done via maximizing (minimizing) the entropy − n k=1 p k ln p k . People demonstrate both risk minimization (aversion) and maximization (seeking) [12,25], though the risk in those situations is a less specific (and more difficult to describe) notion-first because it involves agents having different utilities for same actions, and second because it involves a d...