In this paper, several techniques for learning game state evaluation functions by reinforcement are proposed. The first is a generalization of tree bootstrapping (tree learning): it is adapted to the context of reinforcement learning without knowledge based on non-linear functions. With this technique, no information is lost during the reinforcement learning process. The second is a modification of minimax with unbounded depth extending the best sequences of actions to the terminal states. This modified search is intended to be used during the learning process. The third is to replace the classic gain of a game (+1 / −1) with a reinforcement heuristic. We study particular reinforcement heuristics such as: quick wins and slow defeats ; scoring ; mobility or presence. The four is another variant of unbounded minimax, which plays the safest action instead of playing the best action. This modified search is intended to be used after the learning process. The five is a new action selection distribution. The conducted experiments suggest that these techniques improve the level of play. Finally, we apply these different techniques to design program-players to the game of Hex (size 11 and 13) surpassing the level of Mohex 2.0 with reinforcement learning from self-play without knowledge. At Hex size 11 (without swap), the program-player reaches the level of Mohex 3HNN.
The aim of this paper is to gather several results concerning the enumeration of specific classes of polycubes. We first consider two classes of $3$-dimensional vertically-convex directed polycubes: the plateau polycubes and the parallelogram polycubes. An expression of the generating function is provided for the former class, as well as an asymptotic result for the number of polycubes of each class with respect to volume and width. We also consider three classes of $d$-dimensional polycubes $(d\geq 3)$ and we state asymptotic results for the number of polycubes of each class with respect to volume and width.
Relying on the recently introduced multi-algebras, we present a general approach for reasoning about temporal sequences of qualitative information that is generally more efficient than existing techniques. Applying our approach to the specific case of sequences of topological information about constantsize regions, we show that the resulting formalism has a complete procedure for deciding consistency, and we identify its three maximal tractable subclasses containing all basic relations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.