Internet traffic classification (TC) is a critical technique in network management and is widely applied in various applications. In traditional TC problems, the edge devices need to send the raw traffic data to the server for centralized processing, which not only generates a lot of communication overhead but also leads to the privacy leakage and information security issues. Federated learning (FL) is a new distributed machine learning paradigm that allows multiple clients to train a global model collaboratively without raw traffic data sharing. The TC in a FL framework preserves the user privacy and data security by keeping the raw traffic data local. However, because of the different user behaviours and user preferences, traffic data heterogeneity emerges. The existing FL solutions introduce bias in model training by averaging the local model parameters from all heterogeneous clients, which degrades the classification accuracy of the learnt global classification model. To improve the classification accuracy in heterogeneous data environment, this paper proposes a novel client selection algorithm, namely, WCL, in federated paradigm based on a combination of model weight divergence and local model training loss. Extensive experiments on the public traffic dataset QUIC and ISCX have proved that the WCL algorithm obtains, compared to CMFL, superior performance in improving model accuracy and convergence speed on low heterogeneous traffic data and high heterogeneous traffic data, respectively.