This work focuses on distributed optimization for multi-task learning with matrix sparsity regularization. We propose a fast communication-efficient distributed optimization method for solving the problem. With the proposed method, training data of different tasks can be geo-distributed over different local machines, and the tasks can be learned jointly through the matrix sparsity regularization without a need to centralize the data. We theoretically prove that our proposed method enjoys a fast convergence rate for different types of loss functions in the distributed environment. To further reduce the communication cost during the distributed optimization procedure, we propose a data screening approach to safely filter inactive features or variables. Finally, we conduct extensive experiments on both synthetic and real-world datasets to demonstrate the effectiveness of our proposed method.