A critical look at experimental evaluations of EBL

Segre, Alberto M.; Elkan, Charles; Russell, Alexander

doi:10.1007/bf00114163

Cited by 22 publications

(16 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We applied both tests to the speedup learning data set taken from Etzioni (1990a) and have shown that most of the differences observed are statistically significant (see, in particular, the results of our extended signed rank test in table 6). We believe that this approach helps to allay the concerns regarding the use of resource bounds raised by Segre et al (1991). Finally, although we have focused on speedup learning data, we note that our methodology can be used to analyze any quantitative comparison between two systems on a common set of problems.…”

Section: Resultsmentioning

confidence: 99%

“…In a recent paper, Segre et al (1991) argue that the experimenter's choice of time bound can influence the results of the experiment. Segre et al illustrate this point with a hypothetical example reproduced in tables 1 and 2.…”

Section: Motivationmentioning

confidence: 99%

“…Often, the experimenter imposes a bound on the CPU time the problem solver is allowed to spend on any individual problem. Segre et al (1991) argue that the experimenter's choice of time bound can bias the results of the experiment. To address this problem, we present statistical hypothesis tests specifically designed to analyze speedup data and eliminate this bias.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Statistical methods for analyzing speedup learning experiments

Etzioni

1994

Mach Learn

View full text Add to dashboard Cite

Abstract. Speedup learning systems are typically evaluated by comparing their impact on a problem solver's performance. The impact is measured by running the problem solver, before and after learning, on a sample of problems randomly drawn from some distribution. Often, the experimenter imposes a bound on the CPU time the problem solver is allowed to spend on any individual problem. Segre et al. (1991) argue that the experimenter's choice of time bound can bias the results of the experiment. To address this problem, we present statistical hypothesis tests specifically designed to analyze speedup data and eliminate this bias. We apply the tests to the data reported by Etzioni (1990a) and show that most (but not all) of the speedups observed are statistically significant.Keywords. speedup learning, statistics, explanation-based learning, experimental methodology MotivationSpeedup learning systems are systems that automatically generate search-control knowledge (e.g., Etzioni, 1990b;Knoblock, 1990;Minton, 1988a;Mooney, 1989;O'Rorke, 1989;Shavlik, 1990). The effectiveness of a speedup learning system is typically evaluated by comparing the performance of a problem solver, guided by the learned knowledge, with the performance of the problem solver given no control knowledge, or given control knowledge acquired by a different learning system. The problem solver is run on a sample of problems randomly drawn from some distribution. In many experiments, the problem solver requires an inordinately long time to solve one or more of the problems due to the combinatorial nature of its search. To allow the experiments to complete in reasonable time, the experimenter imposes a bound on the CPU time that the problem solver is allowed to spend on any individual problem. When that bound is exceeded, the problem is marked "unsolved" and the problem solver moves on to the next problem. The same time boundThe statistical tests described in this article are encoded as COMMON LISP routines. The routines, and the data analyzed in the article, are available by sending mail to ETZIONI@CS. WASHINGTON. EDU. We hope that other researchers will use the routines to validate their own speedup learning experiments.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Motivationmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Statistical methods for analyzing speedup learning experiments

Etzioni

1994

Mach Learn

View full text Add to dashboard Cite

show abstract

“…Similar comparisons in the past have been done using cumulative time graphs [45], but Segre et al [55] argue that such comparisons could be misleading because changing the time limit can change the results. To avoid this problem, the total time expended solving all of the problems is graphed against the CPU time bound.…”

mentioning

confidence: 86%

Automatically generating abstractions for planning

Knoblock

1994

Artificial Intelligence

178

View full text Add to dashboard Cite

“…We used the number of problems solved within the assigned resource limit as an additional measure of performance. As Segre et al (1991) point out, comparisons based on these measures may be sensitive to the values of the resource limits, and inconclusive when some of the problems are not solved within the assigned resource limits. Figure 4 shows FAILSAFE-2's performance in the blocks world domain.…”

Section: Testing the Effectiveness Hypothesismentioning

confidence: 99%

On-line learning from search failures

Bhatnagar¹,

Mostow

1994

Mach Learn

View full text Add to dashboard Cite

Abstract. Learning by explaining failures and avoiding similar ones thereafter is an attractive way to speed up problem solving. However, previous methods for explanation-based learning from failure can take too long to detect failures, explain them, or test the learned rules. This expense is especially critical for adaptive search, in which control knowledge acquired while solving an individual problem instance must be learned quickly enough to speed up its solution.We present an adaptive search technique that speeds up state-space search by learning heuristic censors while searching. The censors speed up search by pruning away more and more of the space until a solution is found in the pruned space. Censors are learned by explaining dead ends and other search failures. To learn quickly, the technique overgeneralizes by assuming that certain constraints are preservable, i.e., remain true along at least one solution path. A recovery mechanism detects violations of this assumption and selectively relaxes learned censors. The technique, implemented in an adaptive problem solver named FAILSAFE-2, learns useful heuristics that cannot be learned by other reported methods.We present experimental evidence that FAILSAFE-2 is effective (learns useful rules, even in recursive domains where PRODIGY and STATIC do not), adaptive (learns fast enough to pay off even within a single problem), and general (speeds up diverse problem solvers, even initially strong ones).

show abstract

A critical look at experimental evaluations of EBL

Cited by 22 publications

References 13 publications

Statistical methods for analyzing speedup learning experiments

Statistical methods for analyzing speedup learning experiments

Automatically generating abstractions for planning

On-line learning from search failures

Contact Info

Product

Resources

About