“…For regular concepts, the value Q i,t of action a i at time t is updated according to the following TD-learning rule: (5) where r i,t+1 is the immediate reward at time t+1, α is the learning rate, γ is the discount factor, and V is the weighted average of values of all actions of the concept the action a i points to, computed from the equation: (6) Where p i is defined by (1). The probability of selecting action a i at time t is defined as: (7) The above definition of TD-learning rule is not applicable for the actuators for many reasons: (a) actuator-concepts do not contain a codelet that computes the immediate-reward, (b) the actuator-concepts do not really partition the state-space, but merely function as a physical-actuator proxy, (c) the computation of the weighted average value V is impossible due to the fact, that actuator-concepts are terminal-leaves and do not connect to next actions/descendents.…”