A Markov decision process (MDP) framework is adopted to represent ensemble control of devices with cyclic energy consumption patterns, e.g., thermostatically controlled loads. Specifically we utilize and develop the class of MDP models previously coined linearly solvable MDPs, that describe optimal dynamics of the probability distribution of an ensemble of many cycling devices. Two principally different settings are discussed. First, we consider optimal strategy of the ensemble aggregator balancing between minimization of the cost of operations and minimization of the ensemble welfare penalty, where the latter is represented as a KL-divergence between actual and normal probability distributions of the ensemble. Then, second, we shift to the demand response setting modeling the aggregator's task to minimize the welfare penalty under the condition that the aggregated consumption matches the targeted time-varying consumption requested by the system operator. We discuss a modification of both settings aimed at encouraging or constraining the transitions between different states. The dynamic programming feature of the resulting modified MDPs is always preserved; however, 'linear solvability' is lost fully or partially, depending on the type of modification. We also conducted some (limited in scope) numerical experimentation using the formulations of the first setting. We conclude by discussing future generalizations and applications.