Untitled

Watkins, Chris; Dayan, Peter

doi:10.1023/a:1022676722315

Cited by 1,404 publications

(191 citation statements)

References 12 publications

Supporting

Mentioning

185

Contrasting

Unclassified

Order By: Relevance

“…delayed Q-learning [13] would be a better option if speed were an issue). We use an off-the-shelf implementation of Q-learning, as explained in [18] and [14]. We use the description of cell contents as a state.…”

Section: An Ai Agent: Q-learningmentioning

confidence: 99%

“…In this paper we use one of these tests, a prototype based on the anytime intelligence test presented in [5] and the environment class introduced in [4], to evaluate one easily accessible biological system (Homo sapiens) and one off-the-shelf AI system, a popular reinforcement algorithm known as Q-learning [18]. In order to do the comparison we use the same environment class for both types of systems and we design hopefully non-biased interfaces for both.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Comparing Humans and AI Agents

Insa-Cabrera

Dowe

España

et al. 2011

Artificial General Intelligence

View full text Add to dashboard Cite

Abstract. Comparing humans and machines is one important source of information about both machine and human strengths and limitations. Most of these comparisons and competitions are performed in rather specific tasks such as calculus, speech recognition, translation, games, etc. The information conveyed by these experiments is limited, since it portrays that machines are much better than humans at some domains and worse at others. In fact, CAPTCHAs exploit this fact. However, there have only been a few proposals of general intelligence tests in the last two decades, and, to our knowledge, just a couple of implementations and evaluations. In this paper, we implement one of the most recent test proposals, devise an interface for humans and use it to compare the intelligence of humans and Q-learning, a popular reinforcement learning algorithm. The results are highly informative in many ways, raising many questions on the use of a (universal) distribution of environments, on the role of measuring knowledge acquisition, and other issues, such as speed, duration of the test, scalability, etc.

show abstract

Section: An Ai Agent: Q-learningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Comparing Humans and AI Agents

Insa-Cabrera

Dowe

España

et al. 2011

Artificial General Intelligence

View full text Add to dashboard Cite

show abstract

“…Both paradigms use the TD error to update the state value. Q-learning is based on the TD algorithm, and optimizes the long term value of performing a particular action in a given state by generating and updating a state-action value function Q (Sutton and Barto 1998;Watkins and Dayan 1992). This model assigns a Q-value for each action-state pair (rather than simply for each state as in standard TD).…”

Section: Q-learning Algorithm and The Actor-critic Modelmentioning

confidence: 99%

Computational models of reinforcement learning: the role of dopamine as a reward signal

2010

View full text Add to dashboard Cite

Reinforcement learning is ubiquitous. Unlike other forms of learning, it involves the processing of fast yet content-poor feedback information to correct assumptions about the nature of a task or of a set of stimuli. This feedback information is often delivered as generic rewards or punishments, and has little to do with the stimulus features to be learned. How can such low-content feedback lead to such an efficient learning paradigm? Through a review of existing neuro-computational models of reinforcement learning, we suggest that the efficiency of this type of learning resides in the dynamic and synergistic cooperation of brain systems that use different levels of computations. The implementation of reward signals at the synaptic, cellular, network and system levels give the organism the necessary robustness, adaptability and processing speed required for evolutionary and behavioral success.

show abstract

“…Q-learning finds the Q-value by iteratively approximating the Q-function using the difference between the predicted value and the actual value as the estimation error [38]. γ ∈ [0, 1] is the discount factor and if γ is high, the system gives a higher weight to the Q-value of the new state by the action than the reward of the past action.…”

Section: Dynamic Sensing Parameter Control Using Q-learningmentioning

confidence: 99%

Q-learning-based dynamic joint control of interference and transmission opportunities for cognitive radio

Jang

Yoo

2018

J Wireless Com Network

View full text Add to dashboard Cite

In cognitive radio (CR) system, secondary user (SU) should use available channels opportunistically when the primary user (PU) does not exist. In CR network, SUs have to detect the PU signal with sufficient sensing time to guarantee the detection probability and minimize the interference to the PU, while the CR system should have enough data transmission time to maximize the transmission opportunity of the SU. Therefore, the sensing time and data transmission time of the SU are generally considered as main optimization parameters to maximize the throughput of the CR system. In this paper, a separate sensing node is designated and the sensing is continuously performed using the interference alignment (IA) technique. In this paper, the designated sensing node estimates the interference ratio and transmission opportunity loss ratio. To satisfy the primary user's interference requirement and maximize secondary throughput, we proposed dynamic adjustment mechanism for sensing slot time and sensing report interval using reinforcement learning in time-varying communication environment. The experimental results show that the proposed approach can minimize the interference on PU and enhance the transmission opportunity of SUs.

show abstract

Untitled

Cited by 1,404 publications

References 12 publications

Comparing Humans and AI Agents

Comparing Humans and AI Agents

Computational models of reinforcement learning: the role of dopamine as a reward signal

Q-learning-based dynamic joint control of interference and transmission opportunities for cognitive radio

Contact Info

Product

Resources

About