“…This model introduced¯fty years ago (by Bellman 1 and Howard 2 ) is used in many applications of Operations Research (investment planning, inventory systems management, manufacturing, resource allocation) 3,4 and Arti¯cial Intelligence (path-planning, game search, trading agents, robotics and reinforcement learning). 5,6 Several variants of MDPs have been already considered and investigated, depending on the nature of the set of states (¯nite or not), the nature of the reward function (quantitative or qualitative, scalar-valued or vector-valued), the nature of uncertainty (probabilistic, possibilistic, ordinal) and the observability of the system (partial or total), etc.…”