End-to-End Game-Focused Learning of Adversary Behavior in Security Games

Perrault, Andrew; Wilder, Bryan; Ewing, Eric; Mate, Aditya; Dilkina, Bistra; Tambe, Milind

doi:10.1609/aaai.v34i02.5494

Cited by 19 publications

(13 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This includes specific problems such as Markowitz portfolio optimization (Bengio 1997) or finding physically feasible state transitions (de Avila Belbute-Peres et al 2018), as well as larger classes which exhibit properties like convexity or submodularity. Problem classes used in end-to-end training include polynomial-time solvable frameworks like quadratic programs (Amos and Kolter 2017) and linear programs (Wilder, Dilkina, and Tambe 2019), as well as zero-sum games (Ling, Fang, and Kolter 2018;Perrault et al 2019). In addition, a solution is proposed for problems encoded as submodular optimization problems in (Wilder, Dilkina, and Tambe 2019).…”

Section: Related Workmentioning

confidence: 99%

MIPaaL: Mixed Integer Program as a Layer

Ferber¹,

Wilder²,

Dilkina³

et al. 2020

AAAI

Self Cite

View full text Add to dashboard Cite

Machine learning components commonly appear in larger decision-making pipelines; however, the model training process typically focuses only on a loss that measures average accuracy between predicted values and ground truth values. Decision-focused learning explicitly integrates the downstream decision problem when training the predictive model, in order to optimize the quality of decisions induced by the predictions. It has been successfully applied to several limited combinatorial problem classes, such as those that can be expressed as linear programs (LP), and submodular optimization. However, these previous applications have uniformly focused on problems with simple constraints. Here, we enable decision-focused learning for the broad class of problems that can be encoded as a mixed integer linear program (MIP), hence supporting arbitrary linear constraints over discrete and continuous variables. We show how to differentiate through a MIP by employing a cutting planes solution approach, an algorithm that iteratively tightens the continuous relaxation by adding constraints removing fractional solutions. We evaluate our new end-to-end approach on several real world domains and show that it outperforms the standard two phase approaches that treat prediction and optimization separately, as well as a baseline approach of simply applying decision-focused learning to the LP relaxation of the MIP. Lastly, we demonstrate generalization performance in several transfer learning tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

MIPaaL: Mixed Integer Program as a Layer

Ferber¹,

Wilder²,

Dilkina³

et al. 2020

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…Differentiable optimization Amos et al [2] propose using a quadratic program as a differentiable layer and embedding it into deep learning pipeline, and Agrawal et al [1] extend their work to convex programs. Decision-focused learning [6,34] focuses on the predict-then-optimize [4,8] framework by embedding an optimization layer into training pipeline, where the optimization layers can be convex [6], linear [21,34], and non-convex [25,32]. Unfortunately, these techniques are of limited utility for sequential decision problems because their formulations grow linearly in the number of states and actions and thus differentiating through them quickly becomes infeasible.…”

Section: Related Workmentioning

confidence: 99%

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Wang¹,

Shah²,

Chen³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman-based and policy gradient-based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.Preprint. Under review.

show abstract

“…4.2.1 Task-Oriented Objective: Estimated Clustering Loss. Firstly, we borrow the idea of existing works, which mainly focus on using a surrogate loss function L s to guide the learning process, where practitioners can either choose standard machine learning loss functions or other differentiable task-specific surrogate loss functions [3,9,10,30,40]. For unsupervised clustering task, the training sample is not known previously, whereas the supervised classification task have ground truth as labels.…”

Section: "Warm-up" Based Task-oriented Estimatormentioning

confidence: 99%

Balanced Order Batching with Task-Oriented Graph Clustering

Duan¹,

Hu²,

Wu³

et al. 2020

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

Balanced order batching problem (BOBP) arises from the process of warehouse picking in Cainiao, the largest logistics platform in China. Batching orders together in the picking process to form a single picking route, reduces travel distance. The reason for its importance is that order picking is a labor intensive process and, by using good batching methods, substantial savings can be obtained. The BOBP is a NP-hard combinational optimization problem and designing a good problem-specific heuristic under the quasi-real-time system response requirement is non-trivial. In this paper, rather than designing heuristics, we propose an end-to-end learning and optimization framework named Balanced Task-orientated Graph Clustering Network (BTOGCN) to solve the BOBP by reducing it to balanced graph clustering optimization problem. In BTOGCN, a task-oriented estimator network is introduced to guide the typeaware heterogeneous graph clustering networks to find a better clustering result related to the BOBP objective. Through comprehensive experiments on single-graph and multi-graphs, we show: 1) our balanced task-oriented graph clustering network can directly utilize the guidance of target signal and outperforms the two-stage deep embedding and deep clustering method; 2) our method obtains an average 4.57m and 0.13m picking distance 1 reduction than the expert-designed algorithm on single and multi-graph set and has a good generalization ability to apply in practical scenario. CCS CONCEPTS • Mathematics of computing → Combinatorial optimization; • Information systems → Clustering; Hierarchical data models; • Applied computing → Multi-criterion optimization and decisionmaking.

show abstract

End-to-End Game-Focused Learning of Adversary Behavior in Security Games

Cited by 19 publications

References 14 publications

MIPaaL: Mixed Integer Program as a Layer

MIPaaL: Mixed Integer Program as a Layer

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Balanced Order Batching with Task-Oriented Graph Clustering

Contact Info

Product

Resources

About