“…We also tested supervised learning of value networks (both an NTN and a smaller variant of tjwei's network) with the same set of training data, but we failed to obtain good players (Section 6). [25] 17×4-tuples, TD learning 860,625 51,320 Szubert and Jaśkowski [25] 2×4-tuples & 2×6-tuples, TD learning 22,882,500 99,916 Wu et al [29], [32] 4×6 -tuples, TD learning, 3 stages 136,687,500 143,958 N-tuple Oka and Matsuzaki [20] 40×6-tuples, TD learning 671,088,640 210,476 network Oka and Matsuzaki [20] 10×7-tuples, TD learning 2,684,354,560 234,136 Matsuzaki [14] 8×6-tuples, TD learning 134,217,728 226,958 Matsuzaki [14] 8×7-tuples, TD learning 2,147,483,648 255,198 Matsuzaki [15] 4×6-tuples, backward TC learning, 8 stages 536,870,912 232,262 Jaśkowski [11] 5×6-tuples, TC learning, 16 stages, redundant encoding, etc. 1,347,551,232 324,710 This work (comparison) 5×4-tuples, TC learning, 3 stages 983,040 50,120 Guei et al [9] 2 convolution (2 × 2), 2 full-connect, TD learning N/A ≈11,400 Guei et al [9] 3 convolution (3 × 3), 2 full-connect, TD learning N/A ≈ 5,300 tjwei [26] 2 The rest of the paper is organized as follows.…”