Proceedings of the 12th International Conference on Agents and Artificial Intelligence 2020
DOI: 10.5220/0008949905220529
|View full text |Cite
|
Sign up to set email alerts
|

Uncertainty-based Out-of-Distribution Classification in Deep Reinforcement Learning

Abstract: Robustness to out-of-distribution (OOD) data is an important goal in building reliable machine learning systems. Especially in autonomous systems, wrong predictions for OOD inputs can cause safety critical situations. As a first step towards a solution, we consider the problem of detecting such data in a value-based deep reinforcement learning (RL) setting. Modelling this problem as a one-class classification problem, we propose a framework for uncertainty-based OOD classification: UBOOD. It is based on the ef… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 17 publications
0
5
0
Order By: Relevance
“…Other related topics could also provide inspiration for how to learn rules of behavior that generalize to novel situations due to changing sets of agents and tasks in OASYS. These include (1) out-of-distribution learning (e.g., Sedlmeier et al 2020;Haider et al 2023), where agents detect that their current tasks are different from those experienced during training and must adapt their behavior to new situations, (2) lifelong learning (e.g., Thrun and Mitchell 1995;Ammar et al 2014;Chen and Liu 2018;Mendez, van Seijen, and Eaton 2022) where agents learn how to complete future tasks based on knowledge gained from previously learned tasks, and (3) multitask learning (e.g., Tanaka and Yamamura 2003;Andreas, Klein, and Levine 2017;Rajeswaran et al 2017;Sodhani, Zhang, and Pineau 2021) where agents learn how to generalize to complete a given set of tasks, potentially exploiting task similarities and differences to quickly improve performance on the tasks. Recently, Zhang et al (2023) have also studied decision making through multiagent RL when other agents policies abruptly change during operations, which could be useful for guiding RL under task and type openness.…”
Section: Reinforcement Learning In Oasysmentioning
confidence: 99%
See 1 more Smart Citation
“…Other related topics could also provide inspiration for how to learn rules of behavior that generalize to novel situations due to changing sets of agents and tasks in OASYS. These include (1) out-of-distribution learning (e.g., Sedlmeier et al 2020;Haider et al 2023), where agents detect that their current tasks are different from those experienced during training and must adapt their behavior to new situations, (2) lifelong learning (e.g., Thrun and Mitchell 1995;Ammar et al 2014;Chen and Liu 2018;Mendez, van Seijen, and Eaton 2022) where agents learn how to complete future tasks based on knowledge gained from previously learned tasks, and (3) multitask learning (e.g., Tanaka and Yamamura 2003;Andreas, Klein, and Levine 2017;Rajeswaran et al 2017;Sodhani, Zhang, and Pineau 2021) where agents learn how to generalize to complete a given set of tasks, potentially exploiting task similarities and differences to quickly improve performance on the tasks. Recently, Zhang et al (2023) have also studied decision making through multiagent RL when other agents policies abruptly change during operations, which could be useful for guiding RL under task and type openness.…”
Section: Reinforcement Learning In Oasysmentioning
confidence: 99%
“…These include (1) out‐of‐distribution learning (e.g., Sedlmeier et al. 2020; Haider et al. 2023), where agents detect that their current tasks are different from those experienced during training and must adapt their behavior to new situations, (2) lifelong learning (e.g., Thrun and Mitchell 1995; Ammar et al.…”
Section: Decision Making In Oasysmentioning
confidence: 99%
“…The differences between scenarios and data sets will change the relative performance of the methods [63,64] 11 pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels [65] 12 Nash equilibria of these games are closer to the ideal OOD solutions than the standard empirical risk minimization (ERM) [66] 13 Interval bound propagation (IBP) is used to upper bound the maximal confidence in the l∞-ball and minimize this upper bound during training time [67] 14…”
Section: Numbermentioning
confidence: 99%
“…PEOC (Sedlmeier et al, 2020b) for example uses the policy entropy of an RL agent trained using policy-gradient methods, to detect increased epistemic uncertainty in untrained situations. UBOOD (Sedlmeier et al, 2020a) by contrast is applicable to value-based RL settings and is based on the reducibility of an agent's epistemic uncertainty in it's Q-Value function. Although the methods differentiate between aleatoric and epistemic uncertainty to detect OOD situations, multimodality is not a focus.…”
Section: Uncertainty-based Ood Detectionmentioning
confidence: 99%