2019
DOI: 10.48550/arxiv.1907.04514
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DOB-Net: Actively Rejecting Unknown Excessive Time-Varying Disturbances

Abstract: This paper presents an observer-integrated Reinforcement Learning (RL) approach, called Disturbance OBserver Network (DOB-Net), for robots operating in environments where disturbances are unknown and time-varying, and may frequently exceed robot control capabilities. The DOB-Net integrates a disturbance dynamics observer network and a controller network. Originated from classical DOB mechanisms, the observer is built and enhanced via Recurrent Neural Networks (RNNs), encoding estimation of past values and pred… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 26 publications
0
11
0
Order By: Relevance
“…However, this is not the case when the disturbances are considered as the unobservable parts of the state space, since it it difficult to formulate a transition function to predict next disturbances from current state (including disturbances) and action only. Both history window approach [31] and recurrent policy [32] attempt to resolve this issue through characterizing the disturbed system transition as a multi-step MDP, and assuming the unobservable disturbance waveforms are encoded in robot motion history. The difference lies in the way to use the history data, the history window approach directly takes most recent state-action pairs as additional input to the policy, while the recurrent policy employs RNN to explore past experience in order to learn an optimal embedding of history data.…”
Section: A Reinforcement Learning In Partially Observable Markov Deci...mentioning
confidence: 99%
See 2 more Smart Citations
“…However, this is not the case when the disturbances are considered as the unobservable parts of the state space, since it it difficult to formulate a transition function to predict next disturbances from current state (including disturbances) and action only. Both history window approach [31] and recurrent policy [32] attempt to resolve this issue through characterizing the disturbed system transition as a multi-step MDP, and assuming the unobservable disturbance waveforms are encoded in robot motion history. The difference lies in the way to use the history data, the history window approach directly takes most recent state-action pairs as additional input to the policy, while the recurrent policy employs RNN to explore past experience in order to learn an optimal embedding of history data.…”
Section: A Reinforcement Learning In Partially Observable Markov Deci...mentioning
confidence: 99%
“…Previous work [32] has demonstrated that RNN can directly learn to control a dynamical system with unobservable disturbances in an end-to-end mode, where the past motion history is mapped to the control action. While, inspired by [33], this work applies modular learning procedures, that explicitly decouple the process into disturbance identification and motion control.…”
Section: Modular Network Designmentioning
confidence: 99%
See 1 more Smart Citation
“…In [16], ILC is used to generate a correction signal for DOB to enhance disturbance attenuation when the major component of the disturbance is repetitive. Besides, neural networks have also been introduced to enhance DOB's performance [17][18][19][20]. For example, in [17], a radial basis function NN is combined with DOB to deal with both unknown dynamics and external disturbances; in [20], the conventional DOB is enhanced via Recurrent Neural Networks for disturbance estimation and prediction.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, the performance of conventional DOB depends highly on an accurate plant inverse which usually is not available or is very sensitive to uncertainties, and this significantly limits DOB's performance. Recently, deep learning techniques have been developed and applied to highlevel decision making (e.g., [21][22][23]) and low-level trajectory planning and tracking (e.g., [20,[24][25][26]). Since the drone delivery scenarios considered in this paper is relatively structured, here we leverage the deep learning techniques in convolutional neural network (CNN) and long short-term memory (LSTM) network to include the image-based perception into the DOB framework, aiming to improve DOB's performance.…”
Section: Introductionmentioning
confidence: 99%