Abstract-A reinforcement learning (RL) agent mostly assumes environments are stationary which is not feasible on most real world problems. Most RL approaches adapt slow changes by forgetting the previous dynamics of the environment. Reinforcement learning-context detection (RL-CD) is a technique that helps determine changes of the environment's nature which the agent with the capability to learn different dynamics of the non-stationary environment. In this study we propose an autonomous agent that learns a dynamic environment by taking advantage of hierarchical reinforcement learning (HRL) and present how the hierarchical structure can be integrated into RL-CD to speed up the convergence of a policy.Index Terms-Reinforcement learning, autonomous agent, hierarchical reinforcement learning, non-stationary environment, betweenness centrality, prioritized sweeping.
Sequential data generated from various sources in a multi-mode industrial production system provides valuable information on the current mode of the system and enables one to build a model for each individual operating mode. Using these models in a multi-mode system, one may distinguish modes of the system and, furthermore, detect whether the current mode is a (normal or faulty) mode known from historical data, or a new mode. In this work, we model each individual mode by a probabilistic suffix tree (PST) used to implement variable order Markov models (VOMMs) and propose a novel unsupervised PST matching algorithm that compares the tree models by a matching cost once they are constructed. The matching cost we define comprises of a subsequence dissimilarity cost and a probability cost. Our tree matching method enables to compare two PSTs in linear time by one concurrent top-down pass. We use this matching cost as a similarity measure for k-medoid clustering and cluster PSTs obtained from system modes according to their matching costs. The overall approach yields promising results for unsupervised identification of modes on data obtained from of a physical factory demonstrator. Notably we can distinguish modes on two levels of granularity, both corresponding to human expert labels, with a RAND score of up to 73 % compared to a baseline of at most 42 %.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.