In order to be successfully executed, collaborative tasks performed by
two agents often require a cooperative strategy to be learned. In this
work, we propose a constraint-based multi-agent reinforcement learning
approach called Constrained Multi-agent Soft Actor Critic (C-MSAC) to
train control policies for simulated agents performing collaborative
multi-phase tasks. Given a task with n phases, the first
n-1 phases are treated as constraints for the final task phase
objective, which is addressed with a centralized training and
decentralized execution approach. We highlight our framework on a tray
balancing task including two phases: tray lifting and cooperative tray
control for target following. We evaluate our proposed approach and
compare it against its unconstrained variant (MSAC). The performed
comparisons show that C-MSAC leads to higher success rates, more robust
control policies, and better generalization performance.