1994
DOI: 10.1007/bf00993346
|View full text |Cite
|
Sign up to set email alerts
|

Toward an ideal trainer

Abstract: Abstract. This paper demonstrates how the nature of the opposition during training affects learning to play twoperson, perfect information board games. It considers different kinds of competitive training, the impact of trainer error, appropriate metrics for post-training performance measurement, and the ways those metrics can be applied. The results suggest that teaching a program by leading it repeatedly through the same restricted paths, albeit high quality ones, is overly narrow preparation for the variati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
20
0

Year Published

1996
1996
2012
2012

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 25 publications
(21 citation statements)
references
References 15 publications
1
20
0
Order By: Relevance
“…The first problem is that it is likely to get stuck on a self-consistent but a non-optimal strategy [13]. Secondly, there is no guarantee that the portions of the strategy space searched are the most significant ones [14]. These problems are addressed by ensuring that the population diversity is adequate to avoid local minima and to cover a larger search space.…”
Section: Competitive Environmentsmentioning
confidence: 99%
“…The first problem is that it is likely to get stuck on a self-consistent but a non-optimal strategy [13]. Secondly, there is no guarantee that the portions of the strategy space searched are the most significant ones [14]. These problems are addressed by ensuring that the population diversity is adequate to avoid local minima and to cover a larger search space.…”
Section: Competitive Environmentsmentioning
confidence: 99%
“…Because a novice cannot always capitalize appropriately on its own good patterns or exploit the opposition's poor ones, the learner may initially make incorrect associations, only to find them contradicted later when it plays better (Epstein, 1994c). Our learning algorithm therefore employs a confidence parameter to revalue responses in the face of disagreeing evidence.…”
Section: Managing Inconsistencymentioning
confidence: 99%
“…It learned lose tic-tac-toe and five men's morris, however, with a behavioral standard of 20 and lesson and practice training (Epstein, 1994c). In this environment (unnecessary for the easier of game tic-tac-toe), the program cycles between lessons (a set of two contests against the expert) and practice (a set of seven contests against itself).…”
Section: Correct Reflection Conceptmentioning
confidence: 99%
“…Experiments with Hoyle, for example, found that playing against a perfect player (a program that always makes an optimal move) was too narrow (Epstein, 1994b). An expert game player, after all, should hold its own against opponents of any strength.…”
Section: Modeling Expertisementioning
confidence: 99%