2011 IEEE Congress of Evolutionary Computation (CEC) 2011
DOI: 10.1109/cec.2011.5949796
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning with adaptive Kanerva coding for Xpilot game AI

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…One way to avoid the one-delay between updating and utilizing the Q-value is to exploit similarities between adjacent states in the Q-table. Because nearby states are similar, we can approximate an unvisited state's Q-values with the average of neighboring Qvalues [10]. Approximation extends Q-values of visited states to rarely visited states, avoiding the poor outcomes from choosing actions randomly.…”
Section: Improving the Learning Mechanismmentioning
confidence: 99%
“…One way to avoid the one-delay between updating and utilizing the Q-value is to exploit similarities between adjacent states in the Q-table. Because nearby states are similar, we can approximate an unvisited state's Q-values with the average of neighboring Qvalues [10]. Approximation extends Q-values of visited states to rarely visited states, avoiding the poor outcomes from choosing actions randomly.…”
Section: Improving the Learning Mechanismmentioning
confidence: 99%
“…On-chip regulators are able to switch in every interval (∼100ns), while off-chip regulators can only switch every 200 intervals (10-100µs), which causes many energy saving opportunities to be lost. The ability to identify sufficiently fine grained in- 1 The experimental setup is discussed in Section 6 tervals with on-chip regulators therefore provides a greater opportunity to exploit energy-delay tradeoffs.…”
Section: Opportunities For Nanosecond Dvfsmentioning
confidence: 99%
“…The table size can be exponentially large when each attribute has multiple values, and this poses a significant challenge to reinforcement learning algorithms. Kanerva Coding [1] is an approximation approach to reduce the complexity of high dimensionality, and is used in this work. As shown in Figure 6 (c), the State-Action Mapping Table is replaced with a Prototype-Action Mapping Table that incorporates only a subset of the states.…”
Section: Adaptive Kanerva Codingmentioning
confidence: 99%
“…However, the impact of voltage regulator efficiency losses on the overall energy consumption still matters, and in fact becomes more prominent. For example, at low voltages, LDOs lose 1 7 of the total energy when switching from 0.7V to 0.6V , but only 1 10 of the total energy when switching from 1.0V to 0.9V .…”
Section: Dynamic Voltage and Frequency Scalingmentioning
confidence: 99%