2018
DOI: 10.1109/tciaig.2017.2651887
|View full text |Cite
|
Sign up to set email alerts
|

Mastering 2048 With Delayed Temporal Coherence Learning, Multistage Weight Promotion, Redundant Encoding, and Carousel Shaping

Abstract: Abstract-2048 is an engaging single-player nondeterministic video puzzle game, which, thanks to the simple rules and hard-to-master gameplay, has gained massive popularity in recent years. As 2048 can be conveniently embedded into the discrete-state Markov decision processes framework, we treat it as a testbed for evaluating existing and new methods in reinforcement learning. With the aim to develop a strong 2048 playing program, we employ temporal difference learning with systematic n-tuple networks. We show … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(35 citation statements)
references
References 30 publications
0
34
0
1
Order By: Relevance
“…Here we do not want to go into that direction, but are interested in the performance of a general purpose TD-n-tuple agent. The comparison is only meant to be a correctness check that our implementation is comparable to [28]. We note in passing that Jaskowski [28] reports for a plain TD(0.5)-agent (0-ply, with no 2048-specific extension) similar scores after 6.6e8 learning actions, even lower in the range of 80.000, where our agent achieves 131.000.…”
Section: Results 2048mentioning
confidence: 83%
See 2 more Smart Citations
“…Here we do not want to go into that direction, but are interested in the performance of a general purpose TD-n-tuple agent. The comparison is only meant to be a correctness check that our implementation is comparable to [28]. We note in passing that Jaskowski [28] reports for a plain TD(0.5)-agent (0-ply, with no 2048-specific extension) similar scores after 6.6e8 learning actions, even lower in the range of 80.000, where our agent achieves 131.000.…”
Section: Results 2048mentioning
confidence: 83%
“…• To be successful with nondeterministic games (like 2048) it is important to have appropriate nondeterministic structures in the agents as well: These are Expectimax-N (in contrast to Max-N), the Expectimax layers in MCTSE (in contrast to MCTS) and the afterstate mechanism [28] in the TD-n-tuple agents. • The general-purpose TD-n-tuple agent is successful in a quite diverse set of games: 2048, Connect-4, and the scalable game Hex for various board sizes from 2x2 to 7x7.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…This approach was first introduced to Game 2048 by Szubert and Jaśkowski [25], and several studies were then based on it. The state-of-the-art computer player developed by Jaśkowski [11] combined several techniques to improve NTN-based players, and achieved an average score of 609,104 within a time limit of 1 second per move.…”
Section: Introductionmentioning
confidence: 99%
“…We also tested supervised learning of value networks (both an NTN and a smaller variant of tjwei's network) with the same set of training data, but we failed to obtain good players (Section 6). [25] 17×4-tuples, TD learning 860,625 51,320 Szubert and Jaśkowski [25] 2×4-tuples & 2×6-tuples, TD learning 22,882,500 99,916 Wu et al [29], [32] 4×6 -tuples, TD learning, 3 stages 136,687,500 143,958 N-tuple Oka and Matsuzaki [20] 40×6-tuples, TD learning 671,088,640 210,476 network Oka and Matsuzaki [20] 10×7-tuples, TD learning 2,684,354,560 234,136 Matsuzaki [14] 8×6-tuples, TD learning 134,217,728 226,958 Matsuzaki [14] 8×7-tuples, TD learning 2,147,483,648 255,198 Matsuzaki [15] 4×6-tuples, backward TC learning, 8 stages 536,870,912 232,262 Jaśkowski [11] 5×6-tuples, TC learning, 16 stages, redundant encoding, etc. 1,347,551,232 324,710 This work (comparison) 5×4-tuples, TC learning, 3 stages 983,040 50,120 Guei et al [9] 2 convolution (2 × 2), 2 full-connect, TD learning N/A ≈11,400 Guei et al [9] 3 convolution (3 × 3), 2 full-connect, TD learning N/A ≈ 5,300 tjwei [26] 2 The rest of the paper is organized as follows.…”
Section: Introductionmentioning
confidence: 99%