An Online Convex Optimization Approach to Proactive Network Resource Allocation

Chen, Tianyi; Ling, Qing; Giannakis, Georgios B.

doi:10.1109/tsp.2017.2750109

Cited by 209 publications

(221 citation statements)

References 40 publications

Supporting

Mentioning

221

Contrasting

Order By: Relevance

“…which is a system of nonlinear equations of Q * ∈ R |S|×|X | . Switching the goal from (16) to the fixed point of the Bellman optimality equation (19), a classical yet popular approach is the so-termed Q-learning algorithm [98]: S1) At slot t, select the decision x t by…”

Section: Reinforcement Learning For Interactive Iot Environmentsmentioning

confidence: 99%

“…In addition to value iteration-based methods such as Qlearning, approaches based on direct policy search such as policy gradients and actor-critic methods are also prevalent nowadays, e.g., [83], [91], [108]. This key idea behind policy gradient is to update the θ-parametrized policy π θ using the gradient of the discounted objective (16) with respect to the policy parameters [91]. Convergence of the policy gradient with deep neural networks or kernel-based function approximators is now better understood than Q-learning, along with the limitations of policy gradient-based methods that arise from their high variance.…”

Section: Reinforcement Learning For Interactive Iot Environmentsmentioning

confidence: 99%

“…We conclude this section by remarking that approaches in light of the offline-aided-online learning have also been studied for (16) under the name of experience replay, which achieves tremendous success in various artificial intelligence tasks [68].…”

Section: Reinforcement Learning For Interactive Iot Environmentsmentioning

confidence: 99%

See 2 more Smart Citations

Learning and Management for Internet of Things: Accounting for Adaptivity and Scalability

Chen

Barbarossa

Wang³

et al. 2019

Proc. IEEE

Self Cite

View full text Add to dashboard Cite

Internet-of-Things (IoT) envisions an intelligent infrastructure of networked smart devices offering task-specific monitoring and control services. The unique features of IoT include extreme heterogeneity, massive number of devices, and unpredictable dynamics partially due to human interaction. These call for foundational innovations in network design and management. Ideally, it should allow efficient adaptation to changing environments, and low-cost implementation scalable to massive number of devices, subject to stringent latency constraints. To this end, the overarching goal of this paper is to outline a unified framework for online learning and management policies in IoT through joint advances in communication, networking, learning, and optimization. From the network architecture vantage point, the unified framework leverages a promising fog architecture that enables smart devices to have proximity access to cloud functionalities at the network edge, along the cloud-to-things continuum. From the algorithmic perspective, key innovations target online approaches adaptive to different degrees of nonstationarity in IoT dynamics, and their scalable model-free implementation under limited feedback that motivates blind or bandit approaches. The proposed framework aspires to offer a stepping stone that leads to systematic designs and analysis of task-specific learning and management schemes for IoT, along with a host of new research directions to build on. automation, and thus intelligence toward the vision of realtime IoT. However, despite the popularity of IoT, several critical challenges must be addressed before embracing its full potential [5], [86]. To this end, we highlight three key challenges that are arguably expected to be at the epicenter of emerging IoT research fields. Fig. 1:Internet of Everything[3]. Extremeheterogeneity. The computational and communication capacities of connected devices differ due to differences in hardware (e.g., CPU frequency), communication protocol (e.g., ZigBee, WiFi), and energy availability (e.g., battery level) [103]. The tasks carried out on various devices are often considerably diverse, e.g., motion sensors monitor human behavior in a smart home [60], while cameras are responsible for recognizing a suspicious behavior in a crowded environment, or, vehicle plates in a parking garage.Unpredictable dynamics. Unlike many existing communication, computing and networking platforms, the IoT dynamics can stem from multiple sources, where adaptivity is not only critical but also essential in designing hardware and management protocols. Such sources entail human-in-the-loop dynamics in addition to physical objects [60], demand response in energy systems [40], and intelligent automotive applications [59]. In these applications, IoT dynamics are intertwined with or even partially determined by human behavior [34], [69], [73] -as such, high degree of adaptivity in the algorithm and hardware design is needed.Scalability at the core. IoT entails an intelligent network infrastructure with...

show abstract

Section: Reinforcement Learning For Interactive Iot Environmentsmentioning

confidence: 99%

Section: Reinforcement Learning For Interactive Iot Environmentsmentioning

confidence: 99%

See 1 more Smart Citation

Learning and Management for Internet of Things: Accounting for Adaptivity and Scalability

Chen

Barbarossa

Wang³

et al. 2019

Proc. IEEE

Self Cite

View full text Add to dashboard Cite

show abstract

“…Another related line of work concerns online convex optimization with constraints (Mahdavi et al, 2012(Mahdavi et al, , 2013Chen et al, 2017;Neely and Yu, 2017;Chen and Giannakis, 2018). Their setting differs from ours 6 in several important respects.…”

mentioning

confidence: 94%

Adversarial Bandits with Knapsacks

Immorlica

Sankararaman

Schapire

et al. 2019

2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)

View full text Add to dashboard Cite

We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. This is a considerably harder problem, compared to both the stochastic version and the "classic" adversarial bandits, in that regret minimization is no longer feasible. Instead, the objective is to minimize the competitive ratio: the ratio of the benchmark reward to algorithm's reward.We design an algorithm with competitive ratio O(log T ) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new perspective on the stochastic version of the problem. We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version, and use it as a subroutine to solve the latter.Our algorithm is the first "black-box reduction" from bandits to BwK: it takes an arbitrary bandit algorithm and uses it as a subroutine. We use this reduction to derive several extensions. * An extended abstract is published in FOCS 2019: 60th Annual IEEE Symposium on Foundations of Computer Science. The conference version corresponds (as an extended abstract) to the March'19 version of this manuscript. Since then, we have improved the approximation ratios in Section 5 and Section 6, reducing the dependence on d and shaving off some constant factors. In particular, we streamlined some looseness in the algorithm in Section 5, and made the final computation somewhat more efficient. Also, we made the lower bound statements more explicit, and expanded the discussion of open questions.

show abstract

“…These measures can be related to the rate of change of the function values or minimizers arXiv:1911.05127v1 [math.OC] 12 Nov 2019 over time [26]. Dynamic regret methods for constrained online optimization problems are studied in [27]. All these methods focus on centralized optimization problems.…”

Section: Introductionmentioning

confidence: 99%

A Distributed Online Convex Optimization Algorithm with Improved Dynamic Regret

Zhang

Ravier

Zavlanos

et al. 2019

2019 IEEE 58th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

We consider the problem of distributed online convex optimization, where a group of agents collaborate to track the trajectory of the global minimizers of sums of time-varying objective functions in an online manner. For general convex functions, the theoretical upper bounds of existing methods are given in terms of regularity measures associated with the dynamical system as well as the time horizon. It is thus of interest to determine whether the explicit time horizon dependence can be removed as in the case of centralized optimization. In this work, we propose a novel distributed online gradient descent algorithm and show that the dynamic regret bound of this algorithm has no explicit dependence on the time horizon. Instead, it depends on a new regularity measure quantifying the total change in gradients at the optimal points at each time. The main driving force of our algorithm is an online adaptation of the gradient tracking technique used in static optimization. Since, in many applications, time-varying objective functions and the corresponding optimal points follow a non-adversarial dynamical system, we also consider the role of prediction assuming that the optimal points evolve according to a linear dynamical system. We present numerical experiments that show that our proposed algorithm outperforms the existing distributed mirror descentbased state of the art methods in term of the optimizer tracking performance. We also present an empirical example suggesting that the analysis of our algorithm is optimal in the sense that the regularity measures in the theoretical bounds cannot be removed.

show abstract

An Online Convex Optimization Approach to Proactive Network Resource Allocation

Cited by 209 publications

References 40 publications

Learning and Management for Internet of Things: Accounting for Adaptivity and Scalability

Learning and Management for Internet of Things: Accounting for Adaptivity and Scalability

Adversarial Bandits with Knapsacks

A Distributed Online Convex Optimization Algorithm with Improved Dynamic Regret

Contact Info

Product

Resources

About