Explaining deep reinforcement learning decisions in complex multiagent settings: towards enabling automation in air traffic flow management

Kravaris, Theocharis; Lentzos, Konstantinos; Santipantakis, Georgios M.; Vouros, George A.; Andrienko, Gennady; Andrienko, Natalia; Crook, Ian; García, J.M. Cordero; Martinez, Enrique Iglesias

doi:10.1007/s10489-022-03605-1

Cited by 7 publications

(4 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given a minibatch of states, we calculate the MAE of this minibatch for any action as the mean absolute difference between the Q-values estimated by the mimic learner and the Q-values estimated by the deep Q-network for that action. More formally, for a minibatch of states 𝐷 𝑠 , the MAE 𝑖 of action 𝑎 𝑖 is denoted as: We focus on providing aggregated interpretations, focusing on the contribution of features to local decisions and to the overall policy: This, as suggested by ATM operators, is beneficial towards understanding decisions, helping them to increase their confidence to the solutions proposed, and mastering the inherent complexity in such a multi-agent setting, as solutions may be due to complex phenomena that are hard to be traced [15]. Specifically, in this work, local explainability measures state features' importance on a specific instance (i.e.…”

Section: Evaluation Metrics and Methodsmentioning

confidence: 99%

“…The tuple containing all agents' local states is the joint global state. Q-learning [33] agents has been shown to achieve remarkable performance on this task [15]. In our experiments, all agents share parameters and replay buffer and act independently.…”

Section: Real-world Demand-capacity Problem Settingmentioning

confidence: 95%

“…Similarly, we consider local rewards and joint (global) rewards. The local reward is related to the cost per minute within a hotspot, the total duration of the flight (agent) in hotspots as well as to the delay that a flight has accumulated up to the simulation timestep [15].…”

Section: Real-world Demand-capacity Problem Settingmentioning

confidence: 99%

“…Deep Reinforcement Learning (DRL) has mastered decision making policies in various difficult control tasks [11] [18] [15], games [22] [13] and other real-time applications [14] [37]. Despite the remarkable performance of DRL models, the knowledge of mastering these tasks remains implicit in deep neural networks.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

XDQN: Inherently Interpretable DQN through Mimicking

Kontogiannis¹,

Vouros²

2023

Preprint

View full text Add to dashboard Cite

Although deep reinforcement learning (DRL) methods have been successfully applied in challenging tasks, their application in realworld operational settings is challenged by methods' limited ability to provide explanations. Among the paradigms for explainability in DRL is the interpretable box design paradigm, where interpretable models substitute inner constituent models of the DRL method, thus making the DRL method "inherently" interpretable. In this paper we explore this paradigm and we propose XDQN, an explainable variation of DQN, which uses an interpretable policy model trained through mimicking. XDQN is challenged in a complex, real-world operational multi-agent problem, where agents are independent learners solving congestion problems. Specifically, XDQN is evaluated in three MARL scenarios, pertaining to the demand-capacity balancing problem of air traffic management. XDQN achieves performance similar to that of DQN, while its abilities to provide global models' interpretations and interpretations of local decisions are demonstrated.

show abstract

Section: Evaluation Metrics and Methodsmentioning

confidence: 99%

Section: Real-world Demand-capacity Problem Settingmentioning

confidence: 95%

Section: Real-world Demand-capacity Problem Settingmentioning

confidence: 99%