We address the problem of learning robot control by model-free reinforcement learning (RL). We adopt the probabilistic model of Vlassis and Toussaint (2009) for model-free RL, and we propose a Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller parameters using information obtained from randomly generated robot trajectories. MCEM is related to, and generalizes, the PoWER algorithm of Kober and Peters (2009). In the finite-horizon case MCEM reduces precisely to PoWER, but MCEM can also handle the discounted infinite-horizon case. An interesting result is that the infinite-horizon case can be viewed as a 'randomized' version of the finite-horizon case, in the sense that the length of each sampled trajectory is a random draw from an appropriately constructed geometric distribution. We provide some preliminary experiments demonstrating the effects of fixed (PoWER) vs randomized (MCEM) horizon length in two simulated and one real robot control tasks.
Thermostats are widely used in temperature regulation of indoor spaces and have a direct impact on energy use and occupant thermal comfort. Existing guidelines make recommendations for properly selecting set points to reduce energy use, but there is little or no information regarding the actual achieved thermal comfort of the occupants. While dry-bulb air temperature measured at the thermostat location is sometimes a good proxy, there is less understanding of whether thermal comfort targets are actually met. In this direction, we have defined an experimental simulation protocol involving two office buildings; the buildings have contrasting geometrical and construction characteristics, as well as different building services systems for meeting heating and cooling demands. A parametric analysis is performed for combinations of controlled variables and boundary conditions. In all cases, occupant thermal comfort is estimated using the Fanger index, as defined in ISO 7730. The results of the parametric study suggest that simple bounds on the dry-bulb air temperature are not sufficient to ensure comfort, and in many cases, more detailed considerations taking into account building characteristics, as well as the types of building heating and cooling services are required. The implication is that the calculation or estimation of detailed comfort indices, or even the use of personalised comfort models, is key towards a more human-centric approach to building design and operation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.