We study sequential interdiction when the interdictor has incomplete initial information about the network, and the evader has complete knowledge of the network, including its structure and arc costs. In each time period, the interdictor blocks at most k arcs from the network observed up to that period, after which the evader travels along a shortest path between two (fixed) nodes in the interdicted network. By observing the evader's actions, the interdictor learns about the network structure and arc costs, and adjusts its actions so as to maximize the cumulative cost incurred by the evader. A salient feature of our work is that the feedback in each period is deterministic and adversarial. In addition to studying the regret minimization problem, we also discuss time-stability of a policy, which is the number of time periods until the interdictor's actions match those of an oracle interdictor with prior knowledge of the network. We propose a class of simple interdiction policies that have a finite regret and detect when the instantaneous regret reaches zero in real time. More importantly, we establish that this class of policies belongs to the set of efficient policies.
We present a framework for a class of sequential decision-making problems in the context of general interdiction problems, where a leader and a follower repeatedly interact. At each period, the leader allocates resources to disrupt the performance of the follower (e.g., as in defender-attacker or network interdiction problems), who in turn minimizes some cost function over a set of activities that depends on the leader's decision. While the follower has complete knowledge of his problem, the leader has only partial information, and needs to learn about the cost parameters, available resources, and the follower's activities from the feedback generated by the follower's actions. We measure policy performance in terms of its time-stability, defined as the number of periods it takes for the leader to match the actions of an oracle with complete information. In particular, we propose a class of greedy and robust policies and show that these policies are weakly optimal, eventually match the oracle's actions, and provide a real-time certificate of optimality. We also study a lower bound on any policy performance based on the notion of a semi-oracle. Our numerical experiments demonstrate that the proposed policies consistently outperform a reasonable benchmark, and perform fairly close to the semi-oracle.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.