Nahum Shimkin scite author profile

Abstract. We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning agent creates an on-line map of the process history, and uses an efficient MaxFlow/Min-Cut algorithm for identifying bottlenecks. The policies for reaching bottlenecks are separately learned and added to the model in a form of options (macro-actions). We then extend the basic Q-Cut algorithm to the Segmented Q-Cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenecks in complex environments. Experiments show significant performance improvements, particulary in the initial learning phase.

show abstract

The Impact of Delay Announcements in Many-Server Queues with Abandonment

Armony

Shimkin

Whitt

2009

Operations Research

143

115

View full text Add to dashboard Cite

This paper studies the performance impact of making delay announcements to arriving customers who must wait before starting service in a many-server queue with customer abandonment. The queue is assumed to be invisible to waiting customers, as in most customer contact centers, when contact is made by telephone, e-mail, or instant messaging. Customers who must wait are told upon arrival either the delay of the last customer to enter service or an appropriate average delay. Models for the customer response are proposed. For a rough-cut performance analysis, prior to detailed simulation, two approximations are proposed: (1) the equilibrium delay in a deterministic fluid model, and (2) the equilibrium steady-state delay in a stochastic model with fixed delay announcements. These approximations are shown to be effective in overloaded regimes, where delay announcements are important, by making comparisons with simulations. Within the fluid model framework, conditions are established for the existence and uniqueness of an equilibrium delay, where the actual delay coincides with the announced delay. Multiple equilibria can occur if a key monotonicity condition is violated.

show abstract

Basis Function Adaptation in Temporal Difference Reinforcement Learning

2005

View full text Add to dashboard Cite

Reinforcement Learning (RL) is an approach for solving complex multi-stage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact representation and enables generalization. Linear approximation architectures (where the adjustable parameters are the weights of pre-fixed basis functions) have recently gained prominence due to efficient algorithms and convergence guarantees. Nonetheless, an appropriate choice of basis function is important for the success of the algorithm. In the present paper we examine methods for adapting the basis function during the learning process in the context of evaluating the value function under a fixed control policy. Using the Bellman approximation error as an optimization criterion, we optimize the weights of the basis function while simultaneously adapting the (non-linear) basis function parameters. We present two algorithms for this problem. The first uses a gradient-based approach and the second applies the Cross Entropy method. The performance of the proposed algorithms is evaluated and compared in simulations.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nahum Shimkin

Competitive routing in multiuser communication networks

Competitive routing in multi-user communication networks

Q-Cut—Dynamic Discovery of Sub-goals in Reinforcement Learning

The Impact of Delay Announcements in Many-Server Queues with Abandonment

Basis Function Adaptation in Temporal Difference Reinforcement Learning

Contact Info

Product

Resources

About