2021 International Conference on Computer, Control and Robotics (ICCCR) 2021
DOI: 10.1109/icccr49711.2021.9349369
|View full text |Cite
|
Sign up to set email alerts
|

Learning Ball-Balancing Robot through Deep Reinforcement Learning

Abstract: The ball-balancing robot (ballbot) is a good platform to test the effectiveness of a balancing controller. Considering balancing control, conventional model-based feedback control methods have been widely used. However, contacts and collisions are difficult to model, and often lead to failure in balancing control, especially when the ballbot tilts a large angle. To explore the maximum initial tilting angle of the ballbot, the balancing control is interpreted as a recovery task using Reinforcement Learning (RL)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 11 publications
0
5
0
1
Order By: Relevance
“…In another line of research, latent action representations are often learned to exploit the structure in the action space in reinforcement learning [3,6,7,48,49]. In particular, Deffayet et al [7] use a Variational AutoEncoder (VAE) model to pre-train latent slate space from logged data to improve recommendations.…”
Section: Related Workmentioning
confidence: 99%
“…In another line of research, latent action representations are often learned to exploit the structure in the action space in reinforcement learning [3,6,7,48,49]. In particular, Deffayet et al [7] use a Variational AutoEncoder (VAE) model to pre-train latent slate space from logged data to improve recommendations.…”
Section: Related Workmentioning
confidence: 99%
“…Early methods pinpoint the core issue in offline RL as extrapolation error [13] and suggest using policy constraints to ensure that the learned policy remains close to the behavior policy. These constraints include adding behavior cloning (BC) loss [46] in policy training [12], using the divergence between the behavior policy and the learned policy [13], [14], [25], applying advantage-weighted constraints to balance BC and advantages [39], penalizing the prediction-error of a variational auto-encoder [41], and learning latent actions from the offline data [55]. While policy-constraint methods excel in performance on datasets derived from expert behavior policies, they struggle to discover optimal policies when confronted with datasets featuring suboptimal policies.…”
Section: Related Workmentioning
confidence: 99%
“…The modular design simplifies the locomotion control problem with a fixed gait and allows for individual gait analysis [73]. To learn gait patterns, we used a learned action space that maps the output of the high-level policy to a distribution of gait parameters [74][75][76]. The generative model was trained with known gait parameters [6,13].…”
Section: Comparison Of Different Architecturesmentioning
confidence: 99%
“…Existing works have proposed using generative models such as Variational Autoencoder (VAE)s [75,76] or a normalizing flow [74] to transform the action distribution into a different, possibly multi-modal, distribution. Wenxuan et al [75] and Allshire et al [76] proposed to pre-train generative models with existing motion data for higher sample efficiency.…”
Section: Acknowledgmentsmentioning
confidence: 99%