“…Meanwhile, it is inefficient (sometimes even infeasible) to transmit all data to a central node for analysis. For the reason, distributed machine learning (DML), which stores and processes all or parts of data in different nodes, has attracted significant research interests and applications [ 1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 ]. There are different methods of implementing DML, i.e., primal method (e.g., distributed gradient descend [ 4 , 7 ], federated learning [ 5 , 6 ]) and primal–dual method (e.g., alternating direction method of multipliers (ADMM)) [ 16 ].…”