Inverse Reinforcement Learning Through Max-Margin Algorithm

Shah, Syed Ihtesham Hussain; Coronato, Antonio

doi:10.3233/aise210096

Cited by 4 publications

(5 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The purpose of the margin-based optimization method is to find a reward function that is better in the example strategy than in the learning strategy, that is, the margin between the two strategies is the largest [8]. When the margin decreases to the set value, convergence is considered.…”

Section: Margin Optimization Methodsmentioning

confidence: 99%

“…For any track , margin formula is as follows. Later, Ratliff proposed an improved method, learn to search (LEARCH) [9], so as to convert the quadratic programming problem into an optimization problem, and then solve it by solving Hessian matrix and gradient, so as to solve the difficult problem of solving high-dimensional continuous time problems.…”

Section: Margin Optimization Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A study of inverse reinforcement learning and its implementation

Zhang¹,

Jing²,

Zuo³

et al. 2023

Third International Symposium on Computer Engineering and Intelligent Communications (ISCEIC 2022)

View full text Add to dashboard Cite

When dealing with complex tasks, such as robots imitating human actions and autonomous vehicles driving in urban environments, it can be difficult to determine the reward function of the Markov decision-making process. In contrast to reinforcement learning, inverse reinforcement learning (IRL) can infer the reward function through the finite state space and the linear combination of reward features, given the optimal strategy or expert trajectory. At present, IRL has many challenges, such as ambiguity, large computation and generalization. As part of this paper, we discuss existing research related to these issues, describe the existing traditional IRL methods, implement the model, and then propose future direction for further research.

show abstract

Section: Margin Optimization Methodsmentioning

confidence: 99%

Section: Margin Optimization Methodsmentioning

confidence: 99%

A study of inverse reinforcement learning and its implementation

Zhang¹,

Jing²,

Zuo³

et al. 2023

Third International Symposium on Computer Engineering and Intelligent Communications (ISCEIC 2022)

View full text Add to dashboard Cite

show abstract

“…This class of algorithms interacts directly with the environment (or with an emulator) using Trial&Error schemes to learn the optimal policy. In inverse RL (Shah & Coronato, 2021a, 2021bShah, De Pietro, Paragliola and Coronato, 2022), we study an agent's objectives, values, or rewards with the help of employing insights into its behavior. Several methods are available (e.g., M Monte Carlo (MC), Temporal Difference (TD), etc.).…”

Section: Reinforcement Learningmentioning

confidence: 99%

An AI-empowered infrastructure for risk prevention during medical examination

Shah¹,

Naeem²,

Paragliola³

et al. 2023

Expert Systems with Applications

View full text Add to dashboard Cite

“…In many practical applications we do not have complete knowledge of the environment (i.e., transition probabilities are not known) then the max-margin IRL [47] technique can be utilized. Max-margin IRL method assumes that the reward function can be represented as a linear function of known basis Φ i [48]:…”

Section: B Inverse Reinforcement Learning (Irl)mentioning

confidence: 99%

“…Where weight vector w : (||w|| 1 ≤ 1) minimizes the Euclidean distance (||µ(π) − µ E || 2 ) between the expert feature expectation µ E and the estimated feature expectation µπ [48].…”

Section: B Inverse Reinforcement Learning (Irl)mentioning

confidence: 99%

Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation Learning

et al. 2022

Self Cite

View full text Add to dashboard Cite

Dynamic Treatment Regimes (DTRs) are sets of sequential decision rules that can be adapted over time to treat patients with a specific pathology. DTR consists of alternative treatment paths and any of these treatments can be adapted depending on the patient's characteristics. Reinforcement Learning (RL) and Imitation Learning (IL) approaches have been deployed for obtaining optimal treatment for a patient but, these approaches rely only on positive trajectories (i.e., treatments that concluded with positive responses of the patient). In contrast, negative trajectories (i.e., samples of non-responding treatments) are discarded, although these have valuable information content. We propose a Cooperative Imitation Learning (CIL) method that exploits information from both negative and positive trajectories to learn the optimal DTR. The proposed method reduces the chance of selecting any treatment which results in a negative outcome (negative response of the patient) during the medical examination. To validate our approach, we have considered a well-known DTR which is defined for the treatment of patients with alcohol addiction. Results show that our approach outperforms those that rely only on positive trajectories.

show abstract

Inverse Reinforcement Learning Through Max-Margin Algorithm

Cited by 4 publications

References 31 publications

A study of inverse reinforcement learning and its implementation

A study of inverse reinforcement learning and its implementation

An AI-empowered infrastructure for risk prevention during medical examination

Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation Learning

Contact Info

Product

Resources

About