Zongzhang Zhang scite author profile

Zongzhang Zhang

5Publications

90Citation Statements Received

27Citation Statements Given

How they've been cited

168

How they cite others

Affiliations

Nanjing University, Soochow University, University of Science and Technology of China

Publications

Order By: Most citations

Weighted Double Q-learning

Zhang¹,

Pan²,

Kochenderfer

2017

View full text Add to dashboard Cite

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Qlearning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.

show abstract

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

Yan

Meng

Hao

et al. 2018

View full text Add to dashboard Cite

Recently, multiagent deep reinforcement learning (DRL) has received increasingly wide attention. Existing multiagent DRL algorithms are inefficient when facing with the non-stationarity due to agents update their policies simultaneously in stochastic cooperative environments. This paper extends the recently proposed weighted double estimator to the multiagent domain and propose a multiagent DRL framework, named weighted double deep Qnetwork (WDDQN). By utilizing the weighted double estimator and the deep neural network, WD-DQN can not only reduce the bias effectively but also be extended to scenarios with raw visual inputs. To achieve efficient cooperation in the multiagent domain, we introduce the lenient reward network and the scheduled replay strategy. Experiments show that the WDDQN outperforms the existing DRL and multiaent DRL algorithms, i.e., double DQN and lenient Q-learning, in terms of the average reward and the convergence rate in stochastic cooperative environments.

show abstract

Deep Q-Learning with Prioritized Sampling

Zhai

Liu

Zhang

et al. 2016

View full text Add to dashboard Cite

Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets

Cong

Wang

Zhuang

et al. 2020

View full text Add to dashboard Cite

Generative adversarial imitation learning (GAIL) has shown promising results by taking advantage of generative adversarial nets, especially in the field of robot learning. However, the requirement of isolated single modal demonstrations limits the scalability of the approach to real world scenarios such as autonomous vehicles' demand for a proper understanding of human drivers' behavior. In this paper, we propose a novel multi-modal GAIL framework, named Triple-GAIL, that is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences with data augmentation purpose by introducing an auxiliary selector. We provide theoretical guarantees on the convergence to optima for both of the generator and the selector respectively. Experiments on real driver trajectories and real-time strategy game datasets demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators and outperforms state-of-the-art methods.

show abstract

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Tang¹,

Hao²,

Tangjie³

et al. 2018

Preprint

View full text Add to dashboard Cite

Multiagent reinforcement learning (MARL) is commonly considered to suffer from non-stationary environments and exponentially increasing policy space. It would be even more challenging when rewards are sparse and delayed over long trajectories. In this paper, we study hierarchical deep MARL in cooperative multiagent problems with sparse and delayed reward. With temporal abstraction, we decompose the problem into a hierarchy of different time scales and investigate how agents can learn high-level coordination based on the independent skills learned at the low level. Three hierarchical deep MARL architectures are proposed to learn hierarchical policies under different MARL paradigms. Besides, we propose a new experience replay mechanism to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning. We empirically demonstrate the effectiveness of our approaches in two domains with extremely sparse feedback: (1) a variety of Multiagent Trash Collection tasks, and (2) a challenging online mobile game, i.e., Fever Basketball Defense.Most previous works learn cooperative polices directly over primitive action spaces and usually perform well in environments with dense reward. However, in many real-world scenarios rewards are Preprint. Under review.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zongzhang Zhang

Weighted Double Q-learning

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

Deep Q-Learning with Prioritized Sampling

Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Contact Info

Product

Resources

About