A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs (referred to, in general, as Markov decision problems or MDPs). However, the computational complexity of the classical MDP algorithms, such as value iteration and policy iteration, is prohibitive and can grow intractably with the size of the problem and its related data. Furthermore, these techniques require for each action the one step transition probability and reward matrices, and obtaining these is often unrealistic for large and complex systems. Recently, there has been much interest in a simulation-based stochastic approximation framework called reinforcement learning (RL), for computing near optimal policies for MDPs. RL has been successfully applied to very large problems, such as elevator scheduling, and dynamic channel allocation of cellular telephone systems. In this paper, we extend RL to a more general class of decision tasks that are referred to as semi-Markov decision problems (SMDPs). In particular, we focus on SMDPs under the average-reward criterion. We present a new model-free RL algorithm called SMART (Semi-Markov Average Reward Technique). We present a detailed study of this algorithm on a combinatorially large problem of determining the optimal preventive maintenance schedule of a production inventory system. Numerical results from both the theoretical model and the RL algorithm are presented and compared.semi-Markov decision processes (SMDP), reinforcement learning, average reward, preventive maintenance
Purpose
Social intervention strategies to mitigate COVID-19 are examined using an agent-based simulation model. Outbreak in a large urban region, Miami-Dade County, Florida, USA is used as a case study. Results are intended to serve as a planning guide for decision makers.
Methods
The simulation model mimics daily social mixing behavior of the susceptible and infected generating the spread. Data representing demographics of the region, virus epidemiology, and social interventions shapes model behavior. Results include daily values of infected, reported, hospitalized, and dead.
Results
Results show that early implementation of complete stay-at-home order is effective in flattening and reversing the infection growth curve in a short period of time. Whereas, using Florida's Phase II plan alone could result in 75% infected and end of pandemic via herd immunity. Universal use of face masks reduced infected by 20%. A further reduction of 66% was achieved by adding contact tracing with a target of identifying 50% of the asymptomatic and pre-symptomatic.
Conclusions
In the absence of a vaccine, the strict stay-at-home order, though effective in curbing a pandemic outbreak, leaves a large proportion of the population susceptible. Hence, there should be a strong follow up plan of social distancing, use of face mask, contact tracing, testing, and isolation of infected to minimize the chances of large-scale resurgence of the disease. However, as the economic cost of the complete stay-at-home-order is very high, it can perhaps be used only as an emergency first response, and the authorities should be prepared to activate a strong follow up plan as soon as possible. The target level for contact tracing was shown to have a nonlinear impact on the reduction of the percentage of population infected. Increase in contact tracing target from 20% to 30% appeared to provide the largest incremental benefit.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.