“…A model that fits the problem of manipulation under the representation Fig. 1 is that of a Markov Decision Process (MDP) [18], [22]. In general, an MDP is defined by the tuple {S, A, T, R}, where S and A are sets of states and actions, respectively (s ∈ S and a ∈ A), R is the set of rewards (r ∈ R), and T is a set of transition probabilities ({P (s |s, a)} ∈ T , where P (s |s, a) represents the probability of transitioning to state s from s after action a).…”