SUMMARYThis paper investigates the effect of noises added to hidden units of AutoEncoders linked to multilayer perceptrons. It is shown that internal representation of learned features emerges and sparsity of hidden units increases when independent Gaussian noises are added to inputs of hidden units during the deep network training. It is also shown that the weights that connect the contaminated hidden units with the next layer have smaller values and outputs of hidden units tend to be more definite (0 or 1). This is expected to improve the generalization ability of the network through this automatic structuration by adding the noises. This network structuration was confirmed by experiments for MNIST digits classification via a deep neural network model.
This paper investigates feature localization abilities upon injecting noise into the convolutional neural network (CNN). The proposed model intended to classify the 7 human emotional states based on facial expressions and it is shown to perform better than the earlier convolutional neural network. The internal representation of learned features emerges and a more accurate localization of those features appears when independent Gaussian noises are added to certain joints during the deep network training. We observed that the weights after the noise contaminated units lead to output that is more definite. Such behavior improves the network generalization through automatic structuration. We confirmed this by emotion classification experiments on KDEF black and Cohen-Kanade + datasets based on facial expression.
Recognizing facial expressions and estimating their corresponding action units' intensities have achieved many milestones. However, such estimating is still challenging due to subtle action units' variations during emotional arousal. The latest approaches are confined to the probabilistic models' characteristics that model action units' relationships. Considering ordinal relationships across an emotional transition sequence, we propose two metric learning approaches with self-attention-based triplet and Siamese networks to estimate emotional intensities. Our emotion expert branches use shifted-window SWIN-transformer which restricts self-attention computation to adjacent windows while also allowing for cross-window connection. This offers flexible modeling at various scales of action units with high performance. We evaluated our network's spatial and time-based feature localization on CK+, KDEF-dyn, AFEW, SAMM, and CASME-II datasets. They outperform deep learning state-of-the-art methods in micro-expression detection on the latter two datasets with 2.4% and 2.6% UAR respectively. Ablation studies highlight the strength of our design with a thorough analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.