Distributed Multi-Agent Online Learning Based on Global Feedback

Xu, Jie; Tekin, Cem; Zhang, Simpson; Schaar, Mihaela van der

doi:10.1109/tsp.2015.2403288

Cited by 26 publications

(13 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, motivated by the applications in cognitive radio network, a line of research (e.g., [28,38,7]) studied the regret minimization problem where the radio channels are modeled by the arms and the rewards represent the utilization rates of radio channels which could be deeply discounted if an arm is simultaneously played by multiple agents and a collision occurs. Regret minimization algorithms were also designed for the distributed settings with an underlying communication network for the peer-topeer environments (e.g., [41,26,43]). In [6,12], the authors studied distributed regret minimization in the adversarial case.…”

mentioning

confidence: 99%

Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-armed Bandits

Chao

Zhang

Zhou

2019

2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)

View full text Add to dashboard Cite

Best arm identification (or, pure exploration) in multi-armed bandits is a fundamental problem in machine learning. In this paper we study the distributed version of this problem where we have multiple agents, and they want to learn the best arm collaboratively. We want to quantify the power of collaboration under limited interaction (or, communication steps), as interaction is expensive in many settings. We measure the running time of a distributed algorithm as the speedup over the best centralized algorithm where there is only one agent. We give almost tight round-speedup tradeoffs for this problem, along which we develop several new techniques for proving lower bounds on the number of communication steps under time or confidence constraints. * Chao Tao is supported in part by NSF IIS-1633215. Qin Zhang is supported in part by NSF IIS-1633215 and CCF-1844234. rounds. In each round each agent pull a (multi)set of arms without communication. For each agent at any time step, based on the indices and outcomes of all previous pulls, all the messages received, and the randomness of the algorithm (if any), the agent, if not in the wait mode, takes one of the following actions:(1) makes the next pull; (2) requests for a communication step and enters the wait mode; (3) terminates and outputs the answer. A communication step starts if all non-terminated agents are in the wait mode. After a communication step all non-terminated agents exit the wait mode and start a new round. During each communication step each agent can broadcast a message to every other agent. While we do not restrict the size of the message, in practice it will not be too large -the information of all pull outcomes of an agent can be described by an array of size at most n, with each coordinate storing a pair (c i , sum i ), where c i is the number of pulls on the i-th arm, and sum i is sum of the rewards of the c i pulls. Once terminated, the agent will not make any further actions. The algorithm terminates if all agents terminate. When the algorithm terminates, each agent should agree on the same best arm; otherwise we say the algorithm fails. The number of rounds of computation, denoted by R, is the number of communication steps plus one.Our goal in the collaborative learning model is to minimize the number of rounds R, and the running time T = r∈[R] t r , where t r is the maximum number of pulls made among the K agents in round r. The motivation for minimizing R is that initiating a communication step always comes with a big time overhead (due to network bandwidth, latency, protocol handshaking), and energy consumption (e.g., think about robots exploring in the deep sea and on Mars). Round-efficiency is one of the major concerns in all parallel/distributed computational models such as the BSP model [42] and MapReduce [16]. The total cost of the algorithm is a weighted sum of R and T , where the coefficients depend on the concrete applications. We are thus interested in the best round-time tradeoffs for collaborative best arm identification.Speedu...

show abstract

mentioning

confidence: 99%

Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-armed Bandits

Chao

Zhang

Zhou

2019

2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)

View full text Add to dashboard Cite

show abstract

“…In such scenarios, significant improvement is expected by enabling cooperative learning among the distributed learners [39]. The challenges in these scenarios are how to design efficient cooperative learning algorithms with low communication complexity [40] and, when the distributed learners are self-interested and have conflicting goals, how to incentivize them to participate in the cooperative learning process using, e.g. rating mechanisms [41] [42].…”

Section: Discussionmentioning

confidence: 99%

Forecasting Popularity of Videos Using Social Media

Schaar

Liu

et al. 2015

IEEE J. Sel. Top. Signal Process.

Self Cite

View full text Add to dashboard Cite

This paper presents a systematic online prediction method (Social-Forecast) that is capable to accurately forecast the popularity of videos promoted by social media. Social-Forecast explicitly considers the dynamically changing and evolving propagation patterns of videos in social media when making popularity forecasts, thereby being situation and context aware. Social-Forecast aims to maximize the forecast reward, which is defined as a tradeoff between the popularity prediction accuracy and the timeliness with which a prediction is issued. The forecasting is performed online and requires no training phase or a priori knowledge. We analytically bound the prediction performance loss of Social-Forecast as compared to that obtained by an omniscient oracle and prove that the bound is sublinear in the number of video arrivals, thereby guaranteeing its short-term performance as well as its asymptotic convergence to the optimal performance. In addition, we conduct extensive experiments using real-world data traces collected from the videos shared in RenRen, one of the largest online social networks in China. These experiments show that our proposed method outperforms existing view-based approaches for popularity prediction (which are not context-aware) by more than 30% in terms of prediction rewards.Index Terms-Situational and contextual awareness, social media, online social networks, popularity prediction, online learning, forecasting algorithm

show abstract

“…Xu and C. Tekin [18], we have projected scrupulously solemnize this issue and progress online learning algorithm that permit the agents to accommodatingly learn how to exploit the overall reward in the worldwide feedback situations without swapping any data among themselves we demonstrate that when the agents perceive the worldwide feedback without faults. The dispersed nature of the measured multi-agent scheme results in to functions loss associates with the case where agents can altercation data.…”

Section: Literature Surveymentioning

confidence: 98%

Multi Agent Based Diabetes Diagnosing and Classification with the Aid of Hybrid Firefly-Neural Network

Tangod¹,

Kulkarni²

2017

IJIES

View full text Add to dashboard Cite

Abstract:A multi agent distributed data mining system for diagnosing diabetes and classification is proposed. Here we are introducing four agents namely user agent, connection agent, updation agent, and security agent. In which each agent performs their own task under the coordination of the connection agent. The user agent collects the user symptoms in order to predict the patient status also the knowledge based of the system. Updation agent is responsible for prescribing drugs for the patient. For secure communication, the proposed technique introduces one security agent between connection agent and updation agent. Here the user symptoms are encrypted by means of advanced encryption standard (AES). Finally, updation agent is classifying the user symptoms and then evaluates the diabetes level with the help of hybrid firefly based neural network algorithm. The performance of the proposed system will acquire with the classification accuracy. The proposed method will be implemented in JAVA platform.

show abstract

Distributed Multi-Agent Online Learning Based on Global Feedback

Cited by 26 publications

References 23 publications

Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-armed Bandits

Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-armed Bandits

Forecasting Popularity of Videos Using Social Media

Multi Agent Based Diabetes Diagnosing and Classification with the Aid of Hybrid Firefly-Neural Network

Contact Info

Product

Resources

About