The optimization of neural networks in terms of computation cost and memory footprint is crucial for their practical deployment on edge devices. In this work, we propose a novel quantization-aware training (QAT) scheme called noise injection pseudo quantization (NIPQ). NIPQ is implemented based on pseudo quantization noise (PQN) and has several advantages. First, both activation and weight can be quantized based on a unified framework. Second, the hyper-parameters of quantization (e.g., layer-wise bit-width and quantization interval) are automatically tuned. Third, after QAT, the network has robustness against quantization, thereby making it easier to deploy in practice. To validate the superiority of the proposed algorithm, we provide extensive analysis and conduct diverse experiments for various vision applications. Our comprehensive experiments validate the outstanding performance of the proposed algorithm in several aspects.Preprint. Under review.
One of the promising ways for the representation learning is contrastive learning. It enforces that positive pairs become close while negative pairs become far. Contrastive learning utilizes the relative proximity or distance between positive and negative pairs. However, contrastive learning might fail to handle the easily distinguished positive-negative pairs because the gradient of easily divided positive-negative pairs comes to vanish. To overcome the problem, we propose a dynamic mixed margin (DMM) loss that generates the augmented hard positive-negative pairs that are not easily clarified. DMM generates hard positive-negative pairs by interpolating the dataset with Mixup. Besides, DMM adopts the dynamic margin incorporating the interpolation ratio, and dynamic adaptation improves representation learning. DMM encourages making close for positive pairs far away, whereas making a little far for strongly nearby positive pairs alleviates overfitting. Our proposed DMM is a plug-and-play module compatible with diverse contrastive learning loss and metric learning. We validate that the DMM is superior to other baselines on various tasks, video-text retrieval, and recommender system task in unimodal and multimodal settings. Besides, representation learned from DMM shows better robustness even if the modality missing occurs that frequently appears on the real-world dataset. Implementation of DMM at downstream tasks is available here: https://github.com/teang1995/DMM INDEX TERMS Multimodal learning, contrastive learning, retrieval, video representation, recommender system, robustness. YONGTAEK LIM received the B.S. degree in electrical and computer engineering from Ajou University, in 2021. He is currently pursuing the M.S. degree in artificial intelligence with the University of Seoul. YEWON KIM received the B.S. degree in electrical and computer engineering from the University of Seoul, in 2022, where she is currently pursuing the M.S. degree in artificial intelligence. CHANGDAE OH received the B.S. degree in statistics from the University of Seoul, in 2022, where he is currently pursuing the M.S. degree in artificial intelligence. KYUNGWOO SONG received the B.S. degree in mathematical sciences and industrial and systems engineering and the M.S. and Ph.D. degrees in industrial and systems engineering from the Korea Advanced Institute of Science and Technology (KAIST), in 2017 and 2021.
Over the past several years, the need for on-device deep learning has been rapidly increasing, and efficient CNN inference on mobile platforms has been actively researched. Sparsity exploitation has been one of the most active research themes, but the studies mostly focus on weight sparsity by weight pruning. Activation sparsity, on the contrary, requires compression at runtime for every input tensor. Hence, the research on activation sparsity mainly targets NPUs that can efficiently process this with their own hardware logic. In this paper, we observe that it is difficult to accelerate CNN inference on mobile GPUs with natural activation sparsity and that the widely used CSR-based sparse convolution is not sufficiently effective due to the compression overhead. We propose several novel sparsification methods that can boost activation sparsity without harming accuracy. In particular, we selectively sparsify some layers with an extremely high sparsity and adopt sparse convolution or dense convolution depending on the layers. Further, we present an efficient sparse convolution method without compression and demonstrate that it can be faster than the CSR implementation. With ResNet-50, we achieved 1.88 speedup compared to TFLite on a Mali-G76 GPU.
Learning fair representation is crucial for achieving fairness or debiasing sensitive information. Most existing works rely on adversarial representation learning to inject some invariance into representation. However, adversarial learning methods are known to suffer from relatively unstable training, and this might harm the balance between fairness and predictiveness of representation. We propose a new approach, learning FAir Representation via distributional CONtrastive Variational AutoEncoder (FarconVAE), which induces the latent space to be disentangled into sensitive and nonsensitive parts. We first construct the pair of observations with different sensitive attributes but with the same labels. Then, Farcon-VAE enforces each non-sensitive latent to be closer, while sensitive latents to be far from each other and also far from the non-sensitive latent by contrasting their distributions. We provide a new type of contrastive loss motivated by Gaussian and Student-t kernels for distributional contrastive learning with theoretical analysis. Besides, we adopt a new swap-reconstruction loss to boost the disentanglement further. FarconVAE shows superior performance on fairness, pretrained model debiasing, and domain generalization tasks from various modalities, including tabular, image, and text. CCS CONCEPTS• Computing methodologies → Neural networks; Learning latent representations; • Social and professional topics → User characteristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.