Abstract:The problem we consider is a stochastic shortest path problem in the presence of a dynamic learning capability. Specifically, a spatial arrangement of possible obstacles needs to be traversed as swiftly as possible, and the status of the obstacles may be disambiguated (at a cost) en route. No efficiently computable optimal policy is known, and many similar problems have been proven intractable. In this article, we adapt a policy which is optimal for a related problem and prove that this policy is indeed also optimal for a restricted class of instances of our problem. Otherwise, this policy is generally suboptimal but, nonetheless, it is both effective and efficiently computable. Examples/simulations are provided in a mine countermeasures application. Of central use is the Tangent Arc Graph, a polynomially sized topological superimposition of exponentially many visibility graphs. © 2011 Wiley Periodicals, Inc. Naval Research Logistics 58: 389-399, 2011 Keywords: mine countermeasures; probabilistic path planning; random disambiguation path; tangent arc graph; markov decision process
THE DISAMBIGUATION PROBLEMA disambiguation problem instance is a tuple (s, t, A, ρ, c), where s and t are points in R 2 , A is a finite set of open discs in R 2 , ρ is a function A → (0, 1], and c is a function A → R ≥0 . An agent wants to traverse from s to t through R 2 , along a continuous curve which is as short as possible in the sense of arclength. However, the discs of A are potential obstacles; for each A ∈ A, the probability that A is an obstacle is ρ(A), independently from the other discs in A. If ρ(A) < 1 then we say A is ambiguous and if ρ(A) = 1 then A is definitely an obstacle. The traversing agent cannot enter discs which are obstacles or ambiguous but, if and when the agent is located at the boundary ∂A for any A ∈ A, the agent has the option to disambiguate the disc A at a cost c(A) added to the traversal arclength, and the agent will learn whether or not A is actually an obstacle. The status of a disc will never change; if A is revealed to be an obstacle then the traversing agent may never enter A, and if A is not an obstacle then A may be entered anytime thereafter. The central issue is how to direct the agent's traversal to optimally utilize this disambiguation capability; that is, to find a policy for the agent which minimizes the expected length of the agent's s, t traversal.Correspondence to: C.E. Priebe (cep@jhu.edu) An example of a disambiguation problem instance is shown in Fig. 1; suppose the values of ρ(A i ), for i = 1, 2, 3, 4, 5 are 0.6, 0.4, 0.9, 0.8, 0.7, and suppose c(A i ) = 1.1 for all i. One particular traversal policy is illustrated in Fig. 1; from s the agent proceeds to the red bullet labeled 1, at which point A 1 is disambiguated. If A 1 is traversable then the agent is to continue till the red bullet labeled 2, at which point A 2 is disambiguated. Then the agent is to proceed to t through A 2 or clockwise around A 2 , according as A 2 is traversable or not. If A 1 was not traversa...