“…There have been a plethora of works exploring promising solutions to federated learning on non-IID data. They can be roughly divided into four categories: 1) client drift mitigation [5,8,9,10], which modifies the local objectives of the clients, so that the local model is consistent with the global model to a certain degree; 2) aggregation scheme [11,12,13,14,15], which improves the model fusion mechanism at the server; 3) data sharing [6,16,17,18], which introduces public datasets or synthesized data to help construct a more balanced data distribution on the client or on the server; 4) personalized federated learning [19,20,21,22], which aims to train personalized models for individual clients rather than a shared global model. However, as suggested by [7], existing algorithms are still unable to achieve good performance on image datasets with deep learning models, and could be no better than vanilla FedAvg [2].…”