“…Q-learning, along with its different variations, is the most commonly used RL method in social robotics. The studies using Q-learning are [ 3 , 13 , 34 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 ]. These comprise studies using standard Q-learning [ 3 , 54 , 55 , 58 , 60 , 62 ], studies modify Q-learning for dealing with delayed reward [ 52 ], tuning the parameters for Q-learning such as [ 13 , 34 , 52 ], dealing with decreasing human feedback over time [ 52 ], comparing their proposed algorithm with Q-learning [ 33 , 49 , 61 , 63 , 64 ], variation of Q-learning called Object Q-learning [ 64 , 65 , 66 ], combining Q-learning with fuzzy inference [ 67 ], SARSA [ 68 , 69 ], TD( ) [ 70 ], MAXQ [ 33 , 71 , 72 ], R-learning [ 32 ], and Deep Q-learning [ 35 , 36 , 73 , 74 ].…”