Exponential moving average based multiagent reinforcement learning algorithms

Awheda, Mostafa D.; Schwartz, Howard M.

doi:10.1007/s10462-015-9447-5

Cited by 15 publications

(13 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work, the exponential moving average (EMA) is adopted as a low-pass filter. The EMA method is model-free and has been widely used in time series analyses [ 30 ]. In EMA, recent data points have higher weights than older ones [ 31 ].…”

Section: Methodsmentioning

confidence: 99%

An automatic method to calculate heart rate from zebrafish larval cardiac videos

Kang

et al. 2018

BMC Bioinformatics

View full text Add to dashboard Cite

BackgroundZebrafish is a widely used model organism for studying heart development and cardiac-related pathogenesis. With the ability of surviving without a functional circulation at larval stages, strong genetic similarity between zebrafish and mammals, prolific reproduction and optically transparent embryos, zebrafish is powerful in modeling mammalian cardiac physiology and pathology as well as in large-scale high throughput screening. However, an economical and convenient tool for rapid evaluation of fish cardiac function is still in need. There have been several image analysis methods to assess cardiac functions in zebrafish embryos/larvae, but they are still improvable to reduce manual intervention in the entire process. This work developed a fully automatic method to calculate heart rate, an important parameter to analyze cardiac function, from videos. It contains several filters to identify the heart region, to reduce video noise and to calculate heart rates.ResultsThe proposed method was evaluated with 32 zebrafish larval cardiac videos that were recording at three-day post-fertilization. The heart rate measured by the proposed method was comparable to that determined by manual counting. The experimental results show that the proposed method does not lose accuracy while largely reducing the labor cost and uncertainty of manual counting.ConclusionsWith the proposed method, researchers do not have to manually select a region of interest before analyzing videos. Moreover, filters designed to reduce video noise can alleviate background fluctuations during the video recording stage (e.g. shifting), which makes recorders generate usable videos easily and therefore reduce manual efforts while recording.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2166-6) contains supplementary material, which is available to authorized users.

show abstract

Section: Methodsmentioning

confidence: 99%

An automatic method to calculate heart rate from zebrafish larval cardiac videos

Kang

et al. 2018

BMC Bioinformatics

View full text Add to dashboard Cite

show abstract

“…In the experiment, EMA (exponential moving average) Q-learning [11], WoLF-PHC [10], and SARSA (stateaction-reward-state-action) [21] are chosen as comparison algorithms. EMA Q-learning and WoLF-PHC are MARL algorithms while SARSA is a type of single-agent RL algorithm corresponding to centralized learning in the context of multiple agents.…”

Section: Simulations On Stochastic Gamesmentioning

confidence: 99%

“…WoLF-policy hillclimbing (WoLF-PHC) [10] only needed to share states and local immediate rewards of each agent, but the convergence property was not guaranteed any more. The exponential moving average (EMA) Q-learning [11] and the weighted policy learner (WPL) [12] empirically converged to a Nash equilibrium in some typical repeated games. To design scalable MARL algorithms that can gain the optimal total sum of reward in fully cooperative games is our motivation.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

EAQR: A Multiagent Q‐Learning Algorithm for Coordination of Multiple Agents

Zhang

Wang

2018

Complexity

View full text Add to dashboard Cite

We propose a cooperative multiagent Q-learning algorithm called exploring actions according to Q-value ratios (EAQR). Our aim is to design a multiagent reinforcement learning algorithm for cooperative tasks where multiple agents need to coordinate their behavior to achieve the best system performance. In EAQR, Q-value represents the probability of getting the maximal reward, while each action is selected according to the ratio of its Q-value to the sum of all actions' Q-value and the exploration rate ε. Seven cooperative repeated games are used as cases to study the dynamics of EAQR. Theoretical analyses show that in some cases the optimal joint strategies correspond to the stable critical points of EAQR. Moreover, comparison experiments on stochastic games with finite steps are conducted. One is the box-pushing, and the other is the distributed sensor network problem. Experimental results show that EAQR outperforms the other algorithms in the box-pushing problem and achieves the theoretical optimal performance in the distributed sensor network problem.

show abstract

“…In this chapter, we propose two MARL algorithms. The algorithms proposed in this chapter have already been published in [2][3][4]. The first proposed algorithm can successfully converge to Nash equilibrium policies in games that have pure Nash equilibrium.…”

Section: Introductionmentioning

confidence: 99%

On Multi-Agent Reinforcement Learning in Matrix, Stochastic and Differential Games

Awheda¹

View full text Add to dashboard Cite

In this thesis, we investigate reinforcement learning algorithms on matrix, stochastic, and differential games. In matrix and stochastic games, the states and actions are represented in continuous domains. We propose two decentralized multi-agent reinforcement learning algorithms to solve the problem of learning in matrix and stochastic games when the learning agent has only minimum knowledge about the underlying game and the other learning agents. The proposed algorithms are the constant learning rate-based exponential moving average Q-learning (CLR-EMAQL) algorithm, and the exponential moving average Q-learning (EMAQL) algorithm. We mathematically show that the proposed CLR-EMAQL algorithm converges to Nash equilibrium in games with pure Nash equilibrium. We introduce the concept of Win-or-Learn-Slow (WoLS) mechanism for the proposed EMAQL algorithm so that the proposed algorithm learns fast when it is winning, and learns cautiously when it is losing. We also provide a theoretical proof of convergence to Nash equilibrium for the proposed EMAQL algorithm in games with pure Nash equilibrium. In games with mixed Nash equilibrium, our mathematical analysis shows that the proposed EMAQL algorithm converges to an equilibrium. Although our mathematical analysis does not explicitly show that the proposed EMAQL algorithm converges to Nash equilibrium, our simulation results indicate that the proposed EMAQL algorithm does converge to Nash equilibrium. Our simulation iii I would like to express my deepest gratitude to my advisor, Professor Howard M.

show abstract

Exponential moving average based multiagent reinforcement learning algorithms

Cited by 15 publications

References 29 publications

An automatic method to calculate heart rate from zebrafish larval cardiac videos

An automatic method to calculate heart rate from zebrafish larval cardiac videos

EAQR: A Multiagent Q‐Learning Algorithm for Coordination of Multiple Agents

On Multi-Agent Reinforcement Learning in Matrix, Stochastic and Differential Games

Contact Info

Product

Resources

About