Multi-objective Contextual Multi-armed Bandit With a Dominant Objective

Tekin, Cem; Turğay, Eralp

doi:10.1109/tsp.2018.2841822

Cited by 32 publications

(21 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There is already a significant amount of attention given to supervised and unsupervised learning research, but relatively less progress has been made for reinforcement learning [6,7]. The main goal of our study is demonstrate that quantum neural networks can be used to solve problems in reinforcement learning, adding a quantum solution to the rich collections of classical methods such as ε-greedy, upper confidence bounds (UCB), and Thompson sampling [22][23][24].…”

Section: Contextual Multi-armed Bandit Problemmentioning

confidence: 99%

Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem

Hu¹,

Hu²

2019

View full text Add to dashboard Cite

Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device.

show abstract

Section: Contextual Multi-armed Bandit Problemmentioning

confidence: 99%

Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem

Hu¹,

Hu²

2019

View full text Add to dashboard Cite

show abstract

“…CMAB is widely used in information services to address the cold-start problem. Existing works on CMAB can be divided into three categories according to the content of the context and relation between the context with the arm reward [27].…”

Section: B Contextual Multi-armed Banditmentioning

confidence: 99%

“…By using conditional probability equation, we can derive the following equation: (27) According to Lemma 2, we have: p{T (j) = t|Q j t ≥ µ j + j }e −2 2 j t , so the above equation can be further inferred as:…”

Section: Regret Analysismentioning

confidence: 99%

“…To prove the effectiveness of our method, we will compare it with the classic MAB methods such as UCB [24], Softmax [21], epsilonGreedy [19], _decreasing [20], Decreas-ingSoftmax [22] and CNAMV [27] in the random dataset. Afterward, CMAB_SM will be further compared with Rand-Choice [36], LinUCB [4], Context [32], Clicks [36] and CNAMA [27] in Yahho!R6A and MovieLens, where Clicks and RandChoice are common baselines for CMAB. The Clicks method usually recommends the item with the most clicks to the user, regardless this item has been clicked by the user or not.…”

Section: A Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

An Adaptive Similarity-Measuring-Based CMAB Model for Recommendation System

Zhong

Ying

2020

IEEE Access

View full text Add to dashboard Cite

Online context-based domains such as recommendation systems strive to promptly suggest the appropriate items to users according to the information about items and users. However, such contextual information may be not available in practical, where the only information we can utilize is users' interaction data. Furthermore, the lack of clicked records, especially for the new users, worsens the performance of the system. To address the issues, similarity measuring, one of the key techniques in collaborative filtering, as well as the online context-based multiple armed bandit mechanism, are combined. The similarity between the context of a selected item and any candidate item is calculated and weighted. An adaptive method for adjusting the weights according to the passed time from clicking is proposed. The weighted similarity is then multiplied with the action value to decide which action is optimal or the poorest. Additionally, we come up with an exploration probability equation by introducing the selected times for the poorest action and the variance of the action values, to balance the exploration and exploitation. The regret analysis is given and the upper bound of the regret is proved. Empirical studies on three benchmarks, random dataset, Yahoo!R6A, and MovieLens, demonstrate the effectiveness of the proposed method.

show abstract

“…Specifically, we adopt the upper confidence b ound ( UCB) a lgorithm [ 21] to enable a MTD to learn the matching preferences and maximize the long-term optimality performance while maintaining a well-balanced tradeoff between exploitation and exploration. UCB was originally developed to solve the multi-armed bandit (MAB) problem [22], which involves sequential decision making based on only local information. It was designed for the single-player scenario and thereby inevitably leading to selection conflicts in the multi-player scenario where multiple MTDs are prone to select the same channel [23].…”

Section: Introductionmentioning

confidence: 99%

Learning-Based Context-Aware Resource Allocation for Edge-Computing-Empowered Industrial IoT

Liao

Zhou

Zhao

et al. 2020

IEEE Internet Things J.

243

104

View full text Add to dashboard Cite

Edge computing provides a promising paradigm to support the implementation of industrial Internet of Things (IIoT) by offloading computational-intensive tasks from resourcelimited machine-type devices (MTDs) to powerful edge servers. However, the performance gain of edge computing may be severely compromised due to limited spectrum resources, capacity-constrained batteries, and context unawareness. In this paper, we consider the optimization of channel selection which is critical for efficient and reliable task delivery. We aim at maximizing the long-term throughput subject to longterm constraints of energy budget and service reliability. We propose a learning-based channel selection framework with service reliability awareness, energy awareness, backlog awareness, and conflict awareness, by leveraging the combined power of machine learning, Lyapunov optimization, and matching theory. We provide rigorous theoretical analysis, and prove that the proposed framework can achieve guaranteed performance with a bounded deviation from the optimal performance with global state information (GSI) based on only local and causal information. Finally, simulations are conducted under both single-MTD and multi-MTD scenarios to verify the effectiveness and reliability of the proposed framework.

show abstract

Multi-objective Contextual Multi-armed Bandit With a Dominant Objective

Cited by 32 publications

References 30 publications

Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem

Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem

An Adaptive Similarity-Measuring-Based CMAB Model for Recommendation System

Learning-Based Context-Aware Resource Allocation for Edge-Computing-Empowered Industrial IoT

Contact Info

Product

Resources

About