This paper examines the convergence of no-regret learning in games with continuous action sets. For concreteness, we focus on learning via "dual averaging", a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then "mirror" the output back to their action sets. In terms of feedback, we assume that players can only estimate their payoff gradients up to a zero-mean error with bounded variance. To study the convergence of the induced sequence of play, we introduce the notion of variational stability, and we show that stable equilibria are locally attracting with high probability whereas globally stable equilibria are globally attracting with probability 1. We also discuss some applications to mixed-strategy learning in finite games, and we provide explicit estimates of the method's convergence speed.
Hypertension is a common disorder and the leading risk factor for cardiovascular disease and premature deaths worldwide. Genome-wide association studies (GWASs) in the European population have identified multiple chromosomal regions associated with blood pressure, and the identified loci altogether explain only a small fraction of the variance for blood pressure. The differences in environmental exposures and genetic background between Chinese and European populations might suggest potential different pathways of blood pressure regulation. To identify novel genetic variants affecting blood pressure variation, we conducted a meta-analysis of GWASs of blood pressure and hypertension in 11 816 subjects followed by replication studies including 69 146 additional individuals. We identified genome-wide significant (P < 5.0 × 10(-8)) associations with blood pressure, which included variants at three new loci (CACNA1D, CYP21A2, and MED13L) and a newly discovered variant near SLC4A7. We also replicated 14 previously reported loci, 8 (CASZ1, MOV10, FGF5, CYP17A1, SOX6, ATP2B1, ALDH2, and JAG1) at genome-wide significance, and 6 (FIGN, ULK4, GUCY1A3, HFE, TBX3-TBX5, and TBX3) at a suggestive level of P = 1.81 × 10(-3) to 5.16 × 10(-8). These findings provide new mechanistic insights into the regulation of blood pressure and potential targets for treatments.
Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called Men-torNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on We-bVision, a large benchmark containing 2.2 million images of real-world noisy labels. The code are at https://github.com/google/mentornet.
In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as the problem of determining which medication to prescribe to a patient. In this paper, we study the offline multi-action policy learning problem with observational data and where the policy may need to respect budget constraints or belong to a restricted policy class such as decision trees. We build on the theory of efficient semi-parametric inference in order to propose and implement a policy learning algorithm that achieves asymptotically minimax-optimal regret. To the best of our knowledge, this is the first result of this type in the multi-action setup, and it provides a substantial performance improvement over the existing learning algorithms. We then consider additional computational challenges that arise in implementing our method for the case where the policy is restricted to take the form of a decision tree.We propose two different approaches, one using a mixed integer program formulation and the other using a tree-search based algorithm.
Abstract-A multiplayer reach-avoid game is a differential game between an attacking team with NA attackers and a defending team with ND defenders playing on a compact domain with obstacles. The attacking team aims to send M of the NA attackers to some target location, while the defending team aims to prevent that by capturing attackers or indefinitely delaying attackers from reaching the target. Although the analysis of this game plays an important role in many applications, the optimal solution to this game is computationally intractable when NA > 1 or ND > 1. In this paper, we present two approaches for the NA = ND = 1 case to determine pairwise outcomes, and a graph theoretic maximum matching approach to merge these pairwise outcomes for an NA, ND > 1 solution that provides guarantees on the performance of the defending team. We will show that the four-dimensional Hamilton-Jacobi-Isaacs approach allows for real-time updates to the maximum matching, and that the two-dimensional "path defense" approach is considerably more scalable with the number of players while maintaining defender performance guarantees.Index Terms-Agents and autonomous systems, cooperative control, game theory, computational methods, nonlinear systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.