Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning

Liu, Wenzhuo; Xiang, Shuying; Zhang, Tao; Han, Yanan; Guo, Xingxing; Zhang, Yahui; Hao, Yue

doi:10.1007/s00521-024-09839-z

Neural Comput & Applic

2024

DOI: 10.1007/s00521-024-09839-z

|View full text |Cite

Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning

Wenzhuo Liu,

Shuying Xiang,

Tao Zhang

et al.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Efficient Jamming Policy Generation Method Based on Multi-Timescale Ensemble Q-Learning

Qian,

Zhou,

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

With the advancement of radar technology toward multifunctionality and cognitive capabilities, traditional radar countermeasures are no longer sufficient to meet the demands of countering the advanced multifunctional radar (MFR) systems. Rapid and accurate generation of the optimal jamming strategy is one of the key technologies for efficiently completing radar countermeasures. To enhance the efficiency and accuracy of jamming policy generation, an efficient jamming policy generation method based on multi-timescale ensemble Q-learning (MTEQL) is proposed in this paper. First, the task of generating jamming strategies is framed as a Markov decision process (MDP) by constructing a countermeasure scenario between the jammer and radar, while analyzing the principle radar operation mode transitions. Then, multiple structure-dependent Markov environments are created based on the real-world adversarial interactions between jammers and radars. Q-learning algorithms are executed concurrently in these environments, and their results are merged through an adaptive weighting mechanism that utilizes the Jensen–Shannon divergence (JSD). Ultimately, a low-complexity and near-optimal jamming policy is derived. Simulation results indicate that the proposed method has superior jamming policy generation performance compared with the Q-learning algorithm, in terms of the short jamming decision-making time and low average strategy error rate.

show abstract

Efficient Jamming Policy Generation Method Based on Multi-Timescale Ensemble Q-Learning

Qian,

Zhou,

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning

Cited by 1 publication

References 7 publications

Efficient Jamming Policy Generation Method Based on Multi-Timescale Ensemble Q-Learning

Efficient Jamming Policy Generation Method Based on Multi-Timescale Ensemble Q-Learning

Contact Info

Product

Resources

About