2019
DOI: 10.1609/aaai.v33i01.33011691
|View full text |Cite
|
Sign up to set email alerts
|

Tackling Sparse Rewards in Real-Time Games with Statistical Forward Planning Methods

Abstract: One of the issues general AI game players are required to deal with is the different reward systems in the variety of games they are expected to be able to play at a high level. Some games may present plentiful rewards which the agents can use to guide their search for the best solution, whereas others feature sparse reward landscapes that provide little information to the agents. The work presented in this paper focuses on the latter case, which most agents struggle with. Thus, modifications are proposed for … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3

Relationship

4
4

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 16 publications
0
11
0
Order By: Relevance
“…A different type of information was used by Gaina et al [39] to dynamically adjust the length of the individuals in RHEA: the flatness of the fitness landscape is used to shorten or lengthen the individuals in order for the algorithm to better deal with sparse reward environments (using longer rollouts for identification of further away rewards), while not harming performance in dense reward games (using shorter rollouts for focus on immediate rewards). However, this had a detrimental effect in RHEA, while boosting MCTS results.…”
Section: Evolutionary Methodsmentioning
confidence: 99%
“…A different type of information was used by Gaina et al [39] to dynamically adjust the length of the individuals in RHEA: the flatness of the fitness landscape is used to shorten or lengthen the individuals in order for the algorithm to better deal with sparse reward environments (using longer rollouts for identification of further away rewards), while not harming performance in dense reward games (using shorter rollouts for focus on immediate rewards). However, this had a detrimental effect in RHEA, while boosting MCTS results.…”
Section: Evolutionary Methodsmentioning
confidence: 99%
“…MADRL with delayed rewards faces the challenge of high dimension (C.3) due to the large state-action spaces. Training performance can be enhanced by: (a) enabling agents to receive rewards at each training step, including dense reward function that produces reward values for majority of transitions, enabling agents to receive rewards in almost every time step, particularly at the early stage of learning [95], for achieving optimal accumulated reward; (b) tailor-made reward functions by experts to assign rewards to behaviors that lead to optimal goal with faster learning speed (O.2); and (c) using credit assignment (or reward shaping [94]) that assigns credits to an action that produces reward to identify the particular action that triggers the reward [92]. Overall, properly designed reward functions ensure a higher convergence speed (O.2) and accumulated reward (P.2).…”
Section: Enhancing Training Performance In Madrl Using Delayed Rewardsmentioning
confidence: 99%
“…individual length, mutation rate), but also the very structure of the algorithm (keeping the population evolved from one game tick to the next with a shift buffer, including or excluding evolutionary operators, adding Monte Carlo rollouts at the end of the individual when evaluating, etc.). These options are all collected from past literature [23], [41], [44], [45] for a resulting EA with a parameter search space size of 1.741E12.…”
Section: B Planning Modulementioning
confidence: 99%