We formulate a framework for applying error-correcting codes (ECCs) on multilabel classification problems. The framework treats some base learners as noisy channels and uses ECC to correct the prediction errors made by the learners. The framework immediately leads to a novel ECC-based explanation of the popular random k-label sets (RAKEL) algorithm using a simple repetition ECC. With the framework, we empirically compare a broad spectrum of off-the-shelf ECC designs for multilabel classification. The results not only demonstrate that RAKEL can be improved by applying some stronger ECC, but also show that the traditional binary relevance approach can be enhanced by learning more parity-checking labels. Our research on different ECCs also helps to understand the tradeoff between the strength of ECC and the hardness of the base learning tasks. Furthermore, we extend our research to ECC with either hard (binary) or soft (real-valued) bits by designing a novel decoder. We demonstrate that the decoder improves the performance of our framework.
Many active learning methods belong to the retraining-based approaches, which select one unlabeled instance, add it to the training set with its possible labels, retrain the classification model, and evaluate the criteria that we base our selection on. However, since the true label of the selected instance is unknown, these methods resort to calculating the average-case or worse-case performance with respect to the unknown label. In this paper, we propose a different method to solve this problem. In particular, our method aims to make use of the uncertainty information to enhance the performance of retraining-based models. We apply our method to two state-of-the-art algorithms and carry out extensive experiments on a wide variety of real-world datasets. The results clearly demonstrate the effectiveness of the proposed method and indicate it can reduce human labeling efforts in many real-life applications.
Transformer models are permutation equivariant. To supply the order and type information of the input tokens, position and segment embeddings are usually added to the input. Recent works proposed variations of positional encodings with relative position encodings achieving better performance. Our analysis shows that the gain actually comes from moving positional information to attention layer from the input. Motivated by this, we introduce Decoupled posItional attEntion for Transformers (DIET), a simple yet effective mechanism to encode position and segment information into the Transformer models. The proposed method has faster training and inference time, while achieving competitive performance on GLUE, XTREME and WMT benchmarks. We further generalize our method to long-range transformers and show performance gain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.