We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided certain conditions. These conditions relate the distance between the target and behavior policies, the eligibility trace parameter and the discount factor, and formalize an underlying tradeoff in off-policy TD(λ). We illustrate this theoretical relationship empirically on a continuous-state control task.
Real-time detection of multiple stance events, more specifically initial contact (IC), foot flat (FF), heel off (HO), and toe off (TO), could greatly benefit neurorobotic (NR) and neuroprosthetic (NP) control. Three real-time threshold-based algorithms have been developed, detecting the aforementioned events based on kinematic data in combination with a biomechanical model. Data from seven subjects walking at three speeds on an instrumented treadmill were used to validate the presented algorithms, accumulating to a total of 558 steps. The reference for the gait events was obtained using marker and force plate data. All algorithms had excellent precision and no false positives were observed. Timing delays of the presented algorithms were similar to current state-of-the-art algorithms for the detection of IC and TO, whereas smaller delays were achieved for the detection of FF. Our results indicate that, based on their high precision and low delays, these algorithms can be used for the control of an NR/NP, with the exception of the HO event. Kinematic data is used in most NR/NP control schemes and is thus available at no additional cost, resulting in a minimal computational burden. The presented methods can also be applied for screening pathological gait or gait analysis in general in/outside of the laboratory.
Accurate and reliable event prediction is imperative for supporting movement with an exoskeleton. Two events are important during a sit-to-stand movement: seat-off, the event at which the subject leaves the chair and start-of-assistance for hip and knee, the earliest time at which assistance may be provided. This paper analyzes two methods to predict and detect these events. Both methods only have joint encoder data as input. The model-based method uses probabilistic principle component analysis with a Kalman filter. Based on a statistically learned model, a joint trajectory is predicted. The seat-off event is predicted using its correlation with maximum hip angle. Since the start-of-assistance event has no clear correlation with joint trajectories, it cannot be detected with this method. The modelfree method is a feed-forward neural network which learns a mapping between inputs and events directly. It is applied to both seat-off prediction and start-of-assistance detection. Methods have been evaluated on 311 lab-recorded movements. For the seat-off event, the model-based method is more reliable than the model-free method. For the start-of-assistance event, the modelfree method performs well, except in an outlier case for one subject. Both of these methods allow accurate and reliable event prediction, only using joint encoder data as inputs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.