Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax

Tokic, Michel; Palm, Günther

doi:10.1007/978-3-642-24455-1_33

Cited by 176 publications

(108 citation statements)

References 9 publications

Supporting

Mentioning

107

Contrasting

Unclassified

Order By: Relevance

“…One of the most challenging tasks in RL can be found in balancing between exploration and exploitation (Tokic & Palm, 2011). An often used approach to this tradeoff is the -greedy method (Watkins, 1989).…”

Section: Exploration Policymentioning

confidence: 99%

See 1 more Smart Citation

Design and optimisation of a (FA)Q-learning-based HTTP adaptive streaming client

et al. 2014

View full text Add to dashboard Cite

In recent years, HTTP Adaptive Streaming (HAS) is becoming the de-facto standard for adaptive video streaming services. A HAS video consists of multiple segments, encoded at multiple quality levels. State-of-the-art HAS clients employ deterministic heuristics to dynamically adapt the requested quality level based on the perceived network conditions. Current HAS client heuristics are however hardwired to fit specific network configurations, making them less flexible to fit a vast range of settings. In this article, a (Frequency Adjusted)Q-Learning HAS client is proposed. In contrast to existing heuristics, the proposed HAS client dynamically learns the optimal behaviour corresponding to the current network environment in order to optimize the Quality of Experience (QoE). Furthermore, the client has been optimized both in terms of global performance and convergence speed. Thorough evaluations show that the proposed client can outperform deterministic algorithms by 11% to 18% in terms of Mean Opinion Score (MOS) in a wide range of network configurations.

show abstract

Section: Exploration Policymentioning

confidence: 99%

“…al. propose the Value-Difference Based Exploration with Softmax action selection (VDBE-Softmax) policy (Tokic, 2010;Tokic & Palm, 2011). With VDBE-Softmax, the -greedy and the Softmax policy are combined in a way that exploration is performed, using the Softmax probabilities defined in Equation (4), with probability .…”

Section: Exploration Policymentioning

confidence: 99%

Design and optimisation of a (FA)Q-learning-based HTTP adaptive streaming client

et al. 2014

View full text Add to dashboard Cite

show abstract

“…With this method, the probability of selecting is distributed uniformly among all the actions. -The second one is the Value-Difference Based Exploration with Softmax action selection policy (VDBE-Softmax) [9]: the client selects random actions using the Softmax probabilities in case of that £ < e and it chooses the greedy action otherwise. The Softmax probabilities are determined through the Boltzmann distribution proposed by Tokic [9] using a normalization of the Q values into the interval [-1,0] and a value for the temperature parameter equals T = 0.01.…”

Section: Influence Of the Exploration Policymentioning

confidence: 99%

Q-learning based control algorithm for HTTP adaptive streaming

Martín

Cabrera

García

2015

2015 Visual Communications and Image Processing (VCIP)

View full text Add to dashboard Cite

Abstract-We present a control algorithm based on Q-Learning for an HTTP Adaptive Streaming (HAS) Client in order to optimize the Quality of Experience (QoE) of the user. First, we propose a model with a suitable number of variables in an attempt to find a reasonable tradeoff between the complexity of the model and its capacity to capture appropriately the dynamics of the system. Second, we define a novel reward function that takes into consideration factors related to the user's QoE. Results will show, that our Q-learning algorithm is able to learn and efficiently control the selection of the segment qualities. In addition, we will show that our proposed approach outperforms another Q-learning approach.

show abstract

“…A good example of a possible alternative approach would be the Softmax algorithm [19], which uses Boltzmann distribution to define action-selection probabilities:…”

Section: Defining Actionsmentioning

confidence: 99%

A reinforcement learning based solution for cognitive network cooperation between co-located, heterogeneous wireless sensor networks

et al. 2014

View full text Add to dashboard Cite

Due to a drastic increase of the number of wireless communication devices, these devices are forced to interfere or interact with each other. This raises the issue of possible effects this coexistence might have on the performance of each of the networks. Negative effects are a consequence of contention for network resources (such as free wireless communication frequencies) between different devices. On the other hand, a possible cooperation between co-located networks could also improve certain aspects of networking for each one of them. This paper presents a self-learning, cognitive cooperation approach for heterogeneous co-located networks. Enabling cooperation is performed by activating or deactivating services that influence the interaction between wireless devices, such as an interference avoidance service, a packet sharing service, etc. Activation of a cooperative service might have both positive and negative effects on network's performance, regarding its high level goals. Such a cooperation approach has to incorporate a reasoning mechanism, centralized or distributed, able to determine the influence of each symbiotic service on the performance of all the participating sub-networks, taking into consideration their requirements. Coupled with the concept of enabling symbiotic services, a machine learning technique known as the Least Squares Policy Iteration (LSPI), is presented in this paper as a novel network cooperation paradigm.

show abstract

Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax

Cited by 176 publications

References 9 publications

Design and optimisation of a (FA)Q-learning-based HTTP adaptive streaming client

Design and optimisation of a (FA)Q-learning-based HTTP adaptive streaming client

Q-learning based control algorithm for HTTP adaptive streaming

A reinforcement learning based solution for cognitive network cooperation between co-located, heterogeneous wireless sensor networks

Contact Info

Product

Resources

About