Abstract. We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning agent creates an on-line map of the process history, and uses an efficient MaxFlow/Min-Cut algorithm for identifying bottlenecks. The policies for reaching bottlenecks are separately learned and added to the model in a form of options (macro-actions). We then extend the basic Q-Cut algorithm to the Segmented Q-Cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenecks in complex environments. Experiments show significant performance improvements, particulary in the initial learning phase.
This paper studies the performance impact of making delay announcements to arriving customers who must wait before starting service in a many-server queue with customer abandonment. The queue is assumed to be invisible to waiting customers, as in most customer contact centers, when contact is made by telephone, e-mail, or instant messaging. Customers who must wait are told upon arrival either the delay of the last customer to enter service or an appropriate average delay. Models for the customer response are proposed. For a rough-cut performance analysis, prior to detailed simulation, two approximations are proposed: (1) the equilibrium delay in a deterministic fluid model, and (2) the equilibrium steady-state delay in a stochastic model with fixed delay announcements. These approximations are shown to be effective in overloaded regimes, where delay announcements are important, by making comparisons with simulations. Within the fluid model framework, conditions are established for the existence and uniqueness of an equilibrium delay, where the actual delay coincides with the announced delay. Multiple equilibria can occur if a key monotonicity condition is violated.
Reinforcement Learning (RL) is an approach for solving complex multi-stage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact representation and enables generalization. Linear approximation architectures (where the adjustable parameters are the weights of pre-fixed basis functions) have recently gained prominence due to efficient algorithms and convergence guarantees. Nonetheless, an appropriate choice of basis function is important for the success of the algorithm. In the present paper we examine methods for adapting the basis function during the learning process in the context of evaluating the value function under a fixed control policy. Using the Bellman approximation error as an optimization criterion, we optimize the weights of the basis function while simultaneously adapting the (non-linear) basis function parameters. We present two algorithms for this problem. The first uses a gradient-based approach and the second applies the Cross Entropy method. The performance of the proposed algorithms is evaluated and compared in simulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.