Reinforcement learning consists of finding policies that maximize an expected cumulative long term reward in a Markov decision process with unknown transition probabilities and instantaneous rewards. In this paper we consider the problem of finding such optimal policies while assuming they are continuous functions belonging to a reproducing kernel Hilbert space (RKHS). To learn the optimal policy we introduce a stochastic policy gradient ascent algorithm with three unique novel features: (i) The stochastic estimates of policy gradients are unbiased. (ii) The variance of stochastic gradients is reduced drawing on ideas from numerical differentiation. (iii) Policy complexity is controlled using sparse RKHS representations. Novel feature (i) is instrumental in proving convergence to a stationary point of the expected cumulative reward. Novel feature (ii) facilitates reasonable convergence times. Novel feature (iii) is a necessity in practical implementations which we show can be done in a way that does not eliminate convergence guarantees. Numerical examples in standard problems illustrate successful learning of policies with low complexity representations which are close to stationary points of the expected cumulative reward.
In the effort to develop medications to combat addiction, researchers have developed models that attempt to describe the neurobiological process of cocaine dependence. It has not, however, yet been determined which of these models, if any, best fits the behaviors and experiences of patients. This project retrospectively evaluated changes in patients' experiences with cocaine over time in order to clarify the model that best fits clinical observations. In 2005 and 2007, 100 treatment-seeking, long-term cocaine users were recruited from an urban university-based treatment center in Philadelphia, PA, United States. Each participant was administered the "Cocaine History Questionnaire" which asked them to describe the initiation and escalation of their cocaine usage, changing reward perceptions, and effects of intoxication at certain points in their drug use careers. This data was then analyzed using repeated measures, examining the within subject differences in reported information over the time points. We found evidence that while the amount of drug used increases, self-reported euphoria decreases while negative symptoms associated with cocaine use also increase. The data provide preliminary evidence for the hedonic dysregulation model of addiction. Limitations and implications of the study are discussed in the conclusion.
Internet of Things(IoT) devices, mobile phones, and robotic systems are often denied the power of deep learning algorithms due to their limited computing power. However, to provide time critical services such as emergency response, home assistance, surveillance, etc, these devices often need real time analysis of their camera data. This paper strives to offer a viable approach to integrate high-performance deep learning based computer vision algorithms with low-resource and low-power devices by leveraging the computing power of the cloud. By offloading the computation work to the cloud, no dedicated hardware is needed to enable deep neural networks on existing low computing power devices. A Raspberry Pi based robot, Cloud Chaser, is built to demonstrate the power of using cloud computing to perform real time vision tasks. Furthermore, to reduce latency and improve real time performance, compression algorithms are proposed and evaluated for streaming real-time video frames to the cloud.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.