Exploiting Learned Policies in Focal Search

Araneda

Baier

2022

SOCS

Machine learning allows learning accurate but inadmissible heuristics for hard combinatorial puzzles like the 15-puzzle, the 24-puzzle, and Rubik's cube. In this paper, we investigate how to exploit these learned heuristics in the context of heuristic search with suboptimality guarantees. Specifically, we study how Focal Search (FS), a well-known bounded-suboptimal search algorithm can be modified to better exploit inadmissible learned heuristics. We propose to use Focal Discrepancy Search (FDS) in the context of learned heuristics, which uses a discrepancy function, instead of the learned heuristic, to sort the focal list. In our empirical evaluation, we evaluate FS and FDS using DeepCubeA, an effective learned heuristic for the 15-puzzle. We show that FDS substantially outperforms FS. This suggests that in some domains, when a highly accurate heuristics is available, one should always consider using discrepancies for better search.

Section: Focal Discrepancy Search For Learned Heuristicsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Focal Discrepancy Search for Learned Heuristics (Extended Abstract)

Araneda

Baier

2022

SOCS

“…Focal Discrepancy Search (FDS) (Araneda, Greco, and Baier 2021;Greco, Araneda, and Baier 2022) is a version of Focal Search which sorts FOCAL by the discrepancy associated with the path of each state. More formally, if s is a state in FOCAL, at any point during the execution of FS, its priority is given by disc(path(s)), which is the number of times along the path where the state with the best heuristic value was not selected for expansion.…”

Section: Focal Discrepancy Searchmentioning

confidence: 99%

Avoiding Errors in Learned Heuristics in Bounded-Suboptimal Search

Baier

2022

SOCS

Despite being very effective, learned heuristics in bounded-suboptimal search can produce heuristic plateaus or move the search to zones of the state space that do not lead to a solution. In addition, it produces inadmissible cost-to-go estimates; therefore, it cannot be exploited with classical algorithms like WA* to produce w-optimal solutions. In this paper, we present two ways in which Focal Search can be modified to exploit a learned heuristic in a bounded suboptimal search: Focal Discrepancy Search, which, to evaluate each state, uses a discrepancy score based on the best-predicted heuristic value; and K-Focal Search, which expands more than one node from the FOCAL list in each expansion cycle. Both algorithms return w-optimal solutions and explore different zones of the state space than the ones that focal search, using the learned heuristic to sort the FOCAL list, would explore.

“…BWAS does not provide suboptimality guarantees, as a consequence of a second drawback of neural-net heuristics: they are not admissible, thus they cannot be directly used by the many search algorithms that exploit admissibility to provide quality guarantees. Recently, Spies et al [2019] and Araneda, Greco, and Baier;Greco, Araneda, and Baier [2021;2022] proposed the use of neural-net heuristics in combination with admissible heuristics in Focal Search (FS). However, these approaches do not address the problem of slow heuristic computation.…”

Section: Introductionmentioning

confidence: 99%

K-Focal Search for Slow Learned Heuristics (Extended Abstract)

Toro

Hernández-Ulloa

et al. 2022

SOCS

Learned heuristics, though inadmissible, can provide very good guidance for bounded-suboptimal search. Given a single search state s and a learned heuristic h, evaluating h(s) is typically very slow relative to expansion time, since state-of-the-art learned heuristics are implemented as neural networks. However, by using a Graphics Processing Unit (GPU), it is possible to compute heuristics using batched computation. Existing approaches to batched heuristic computation are specific to satisficing search and have not studied the problem in the context of bounded-suboptimal search. In this paper, we present K-Focal Search, a bounded suboptimal search algorithm that in each iteration expands K nodes from the FOCAL list and computes the learned heuristic values of the successors using a GPU. We experiment over the Rubik's Cube domain using DeepCubeA, a very effective inadmissible learned heuristic. Our results show that K-Focal Search benefits both from batched computation and from the diversity in the search introduced by its expansion strategy. Over standard FS, it improves runtime by a factor of 6, expansions by up to three orders of magnitude, and finds better solutions, keeping the theoretical guarantees of Focal Search.