We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modeled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be non-convex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data.
This work presents a maximum entropy principle based algorithm for solving minimum multiway k-cut problem defined over static and dynamic digraphs. A multiway k-cut problem requires partitioning the set of nodes in a graph into k subsets, such that each subset contains one prespecified node, and the corresponding total cut weight is minimized. These problems arise in many applications and are computationally complex (NP-hard). In the static setting this article presents an approach that uses a relaxed multiway k-cut cost function; we show that the resulting algorithm converges to a local minimum. This iterative algorithm is designed to avoid poor local minima with its run-time complexity as ∼ O(kIN 3 ), where N is the number of vertices and I is the number of iterations. In the dynamic setting, the edge-weight matrix has an associated dynamics with some of the edges in the graph capable of being influenced by an external input. The objective is to design the dynamics of the controllable edges so that multiway kcut value remains small (or decreases) as the graph evolves under the dynamics. Also it is required to determine the timevarying partition that defines the minimum multiway k-cut value. Our approach is to choose a relaxation of multiway k-cut value, derived using maximum entropy principle, and treat it as a control Lyapunov function to design control laws that affect the weight dynamics. Simulations on practical examples of interactive foreground-background segmentation, minimum multiway k-cut optimization for non-planar graphs and dynamically evolving graphs that demonstrate the efficacy of the algorithm, are presented. arXiv:1907.08720v1 [math.OC]
Typically clustering algorithms provide clustering solutions with prespecified number of clusters. The lack of a priori knowledge on the true number of underlying clusters in the dataset makes it important to have a metric to compare the clustering solutions with different number of clusters. This article quantifies a notion of persistence of clustering solutions that enables comparing solutions with different number of clusters. The persistence relates to the range of dataresolution scales over which a clustering solution persists; it is quantified in terms of the maximum over two-norms of all the associated cluster-covariance matrices. Thus we associate a persistence value for each element in a set of clustering solutions with different number of clusters. We show that the datasets where natural clusters are a priori known, the clustering solutions that identify the natural clusters are most persistent -in this way, this notion can be used to identify solutions with true number of clusters. Detailed experiments on a variety of standard and synthetic datasets demonstrate that the proposed persistence-based indicator outperforms the existing approaches, such as, gap-statistic method, X-means, Gmeans, P G-means, dip-means algorithms and informationtheoretic method, in accurately identifying the clustering solutions with true number of clusters. Interestingly, our method can be explained in terms of the phase-transition phenomenon in the deterministic annealing algorithm, where the number of distinct cluster centers changes (bifurcates) with respect to an annealing parameter.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.