Abstract-The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is described as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.
The sequential Monte Carlo probability hypothesis density (SMC-PHD) filter assisted by particle flows (PF) has been shown to be promising for audiovisual multi-speaker tracking. A clustering step is often employed for calculating the particle flow, which leads to a substantial increase in the computational cost. To address this issue, we propose an alternative method based on the labelled non-zero particle flow (LNPF) to adjust the particle states. Results obtained from the AV16.3 dataset show improved performance by the proposed method in terms of computational efficiency and tracking accuracy as compared with baseline AV-NPF-SMC-PHD methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.