2021
DOI: 10.3233/aise210096
|View full text |Cite
|
Sign up to set email alerts
|

Inverse Reinforcement Learning Through Max-Margin Algorithm

Abstract: Reinforcement Learning (RL) methods provide a solution for decision-making problems under uncertainty. An agent finds a suitable policy through a reward function by interacting with a dynamic environment. However, for complex and large problems it is very difficult to specify and tune the reward function. Inverse Reinforcement Learning (IRL) may mitigate this problem by learning the reward function through expert demonstrations. This work exploits an IRL method named Max-Margin Algorithm (MMA) to learn the rew… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 31 publications
0
5
0
Order By: Relevance
“…The purpose of the margin-based optimization method is to find a reward function that is better in the example strategy than in the learning strategy, that is, the margin between the two strategies is the largest [8]. When the margin decreases to the set value, convergence is considered.…”
Section: Margin Optimization Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The purpose of the margin-based optimization method is to find a reward function that is better in the example strategy than in the learning strategy, that is, the margin between the two strategies is the largest [8]. When the margin decreases to the set value, convergence is considered.…”
Section: Margin Optimization Methodsmentioning
confidence: 99%
“…For any track , margin formula is as follows. Later, Ratliff proposed an improved method, learn to search (LEARCH) [9], so as to convert the quadratic programming problem into an optimization problem, and then solve it by solving Hessian matrix and gradient, so as to solve the difficult problem of solving high-dimensional continuous time problems.…”
Section: Margin Optimization Methodsmentioning
confidence: 99%
“…This class of algorithms interacts directly with the environment (or with an emulator) using Trial&Error schemes to learn the optimal policy. In inverse RL (Shah & Coronato, 2021a, 2021bShah, De Pietro, Paragliola and Coronato, 2022), we study an agent's objectives, values, or rewards with the help of employing insights into its behavior. Several methods are available (e.g., M Monte Carlo (MC), Temporal Difference (TD), etc.).…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…In many practical applications we do not have complete knowledge of the environment (i.e., transition probabilities are not known) then the max-margin IRL [47] technique can be utilized. Max-margin IRL method assumes that the reward function can be represented as a linear function of known basis Φ i [48]:…”
Section: B Inverse Reinforcement Learning (Irl)mentioning
confidence: 99%
“…Where weight vector w : (||w|| 1 ≤ 1) minimizes the Euclidean distance (||µ(π) − µ E || 2 ) between the expert feature expectation µ E and the estimated feature expectation µπ [48].…”
Section: B Inverse Reinforcement Learning (Irl)mentioning
confidence: 99%