2021 17th International Workshop on Cellular Nanoscale Networks and Their Applications (CNNA) 2021
DOI: 10.1109/cnna49188.2021.9610789
|View full text |Cite
|
Sign up to set email alerts
|

An Application of Control- Tutored Reinforcement Learning to the Herding Problem

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 14 publications
0
1
0
Order By: Relevance
“…The computational aspect (both for learning and control) is a challenge in its own and different methods need to be rigorously benchmarked along this dimension. Approximation results exist that allow to reduce the computational burden and we highlight a perhaps less explored direction to integrate data-driven and model-based technologies so that they tutor each other [177,180,181]. Finally, we believe that the ultimate challenge will be to deploy the algorithms underpinned by the methods presented here in applications where reliable models are intrinsically probabilistic and/or hard/expensive to find.…”
Section: Concluding Discussionmentioning
confidence: 97%
“…The computational aspect (both for learning and control) is a challenge in its own and different methods need to be rigorously benchmarked along this dimension. Approximation results exist that allow to reduce the computational burden and we highlight a perhaps less explored direction to integrate data-driven and model-based technologies so that they tutor each other [177,180,181]. Finally, we believe that the ultimate challenge will be to deploy the algorithms underpinned by the methods presented here in applications where reliable models are intrinsically probabilistic and/or hard/expensive to find.…”
Section: Concluding Discussionmentioning
confidence: 97%
“…Following [26] and [27], we employ an RL solution to automatically identify an acceptable policy for a given initial condition x0 and to do so without the need of knowing the dynamics f . Namely, let r : X × X × U → R be a reward function, so that r (x ′ , x, u) is the reward obtained by the agent when taking action u in state x and arriving at the new state x ′ at the next time instant.…”
Section: Using Reinforcement Learning To Find Acceptable Control Poli...mentioning
confidence: 99%