Deep neural networks (DNNs) have achieved astonishing results on a variety of supervised learning tasks owing to a large scale of well-labeled training data. However, as recent researches have pointed out, the generalization performance of DNNs is likely to sharply deteriorate when training data contains label noise. In order to address this problem, a novel loss function is proposed to guide DNNs to pay more attention to clean samples via adaptively weighing the traditional cross-entropy loss. Under the guidance of this loss function, a cross-training strategy is designed by leveraging two synergic DNN models, each of which plays the roles of both updating its own parameters and generating curriculums for the other one. In addition, this paper further proposes an online data filtration mechanism and integrates it into the final cross-training framework, which simultaneously optimizes DNN models and filters out noisy samples. The proposed approach is evaluated through a great deal of experiments on several benchmark datasets with man-made or real-world label noise, and the results have demonstrated its robustness to different noise types and noise scales.INDEX TERMS Deep neural networks, label noise, cross-training, loss function, data filtration.
I. INTRODUCTIONRecently, deep neural networks (DNNs) have achieved remarkable success in the scope of supervised machine learning tasks such as image classification, object detection and semantic analysis. The excellent performance of DNNs is mainly attributed to the accessibility of massive well-labeled data samples. However, it is too costly to manually annotate large-scale datasets. Crowd sourcing [1] and search engines [2] are the alternate paths for obtaining labeled data, but they are likely to introduce label noise, i.e., mislabeled samples. Although Rolnick et al. [3] have mentioned that DNNs are able to generalize well after training on noisy data, it requires a sufficiently large number of clean samples. Unfortunately, when there are limited correct samples mixed with label-corrupted ones, the generalization performance of DNNs will degrade dramatically [4]-[8].Take the popular deep learning model Wide-ResNet [9] as example, Fig. 1 illustrates the negative effect on its test performance when introducing different levels of label noise into the benchmark image datasets CIFAR-10 and The associate editor coordinating the review of this manuscript and approving it for publication was Isaac Triguero.