On-policy concurrent reinforcement learning

Banerjee, Bikramjit; Sen, Sandip; Peng, Jing

doi:10.1080/09528130412331297956

Cited by 11 publications

(17 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to Junling Hu et al the algorithmic complexity of finding an equilibrium in matrix games is unknown. Bikramjit et al prove that the minmax-Q and Nash-Q are equivalent in the purely competitive domains [13]. However, the advantage of the minmax-Q algorithm that it is more resource-efficient.…”

Section: Learning Honeypots and Attackersmentioning

confidence: 98%

“…However, a problem of such an approach is that attackers are considered as part of the environment despite their competitive nature. According to Banerjee et al [13] considering competitors in a learning scenario as aspects of the environment may mean the environment is no longer stationary and convergence results may be impacted.…”

Section: Reinforcement Learningmentioning

confidence: 99%

“…In such a situation, agents need to explore the environment and their objective is to find an optimal policy that maximizes their expected rewards. According, to Banerjee et al [13] the dis tributed reward may depend on the behaviors of the opponents, which makes reward computation challenging, because each opponent has his or her own self-interests. 1) Attacker reward: Assume that an attacker has a dedi cated goal, denoted s*, while penetrating the system.…”

Section: B Rewardsmentioning

confidence: 99%

“…The purpose of an adaptive high-interaction honeypot is to incrementally learn the optimal policy for choosing actions in given states. In the context of adaptive honeypots we use the minmax-learning proposed by Banerjee et al [13] and define a stochastic game between an attacker and a honeypot. As we have two agents, lets denote a player k and k the competitor.…”

Section: Learning Honeypots and Attackersmentioning

confidence: 99%

See 3 more Smart Citations

Adaptive and self-configurable honeypots

Wagener

State

Engel

et al. 2011

12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops

View full text Add to dashboard Cite

Honeypot evangelists propagate the message that honeypots are particularly useful for learning from attackers.However, by looking at current honeypots, most of them are statically configured and managed, which requires a priori knowledge about attackers. In this paper we propose a high interaction honeypot capable of learning from attackers and capable of dynamically changing its behavior using a variant of reinforcement learning. It can strategically block the execution of programs, lure the attacker by substituting programs and insult attackers with the intent of revealing the attacker's nature and ethnic background. We also investigated the fact that attackers could learn to defeat the honeypot and discovered that attacker and honey pot interests sometimes diverge.

show abstract

Section: Learning Honeypots and Attackersmentioning

confidence: 98%

Section: Reinforcement Learningmentioning

confidence: 99%

Section: B Rewardsmentioning

confidence: 99%

Section: Learning Honeypots and Attackersmentioning

confidence: 99%

See 2 more Smart Citations

Adaptive and self-configurable honeypots

Wagener

State

Engel

et al. 2011

12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops

View full text Add to dashboard Cite

show abstract

“…This makes the focus on multiagent learning research extremely timely and justified. Several algorithms for multiagent learning have been proposed [2,5,7,8], mostly guaranteed to converge to an equilibrium in the limit. It is noted in [3] that none of these methods simultaneously satisfies rationality and convergence, two of the desirable criteria for any multiagent learning algorithm.…”

Section: Introductionmentioning

confidence: 99%

Adaptive policy gradient in multiagent learning

Banerjee

Peng

2003

Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems

Self Cite

View full text Add to dashboard Cite

Inspired by the recent results in policy gradient learning in a generalsum game scenario, in the form of two algorithms, IGA and WoLF-IGA, we explore an alternative version of WoLF. We show that our new WoLF criterion (PDWoLF) is also accurate in 2 ¢ 2 games, while being accurately computable even in more than 2-action games, unlike WoLF that relies on estimation. In particular, we show that this difference in accuracy in more than 2-action games translates to faster convergence (to Nash equilibrium policies in self-play) for PDWoLF in conjunction with the general Policy Hill Climbing algorithm. Interestingly, this expedience gets more pronounced with increasing learning rate ratio, for which we also delve into an explanation. We also show experimentally that learning faster with PDWoLF could also entail learning better policies earlier in self play. Finally we present the scalable version of PDWoLF and show that even in such domains requiring generalizations and approximations, PDWoLF could dominate WoLF in performance.

show abstract

The Success and Failure of Tag-Mediated Evolution of Cooperation

McDonald

Sen

2006

Learning and Adaption in Multi-Agent Systems

Self Cite

View full text Add to dashboard Cite

Use of tags to limit partner selection for playing has been shown to produce stable cooperation in agent populations playing the Prisoner's Dilemma game. There is, however, a lack of understanding of how and why tags facilitate such cooperation. We start with an empirical investigation that identifies the key dynamics that result in sustainable cooperation in PD. Sufficiently long tags are needed to achieve this effect. A theoretical analysis shows that multiple simulation parameters including tag length, mutation rate and population size will have significant effect on sustaining cooperation. Experiments partially validate these observations. Additionally, we claim that tags only promote mimicking and not coordinated behavior in general, i.e., tags can promote cooperation only if cooperation requires identical actions from all group members. We illustrate the failure of the tag model to sustain cooperation by experimenting with domains where agents need to take complementary actions to maximize payoff.

show abstract

On-policy concurrent reinforcement learning

Cited by 11 publications

References 12 publications

Adaptive and self-configurable honeypots

Adaptive and self-configurable honeypots

Adaptive policy gradient in multiagent learning

The Success and Failure of Tag-Mediated Evolution of Cooperation

Contact Info

Product

Resources

About