Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. Most existing AU detection works often treat face alignment as a preprocessing and handle the two tasks independently. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared features are learned firstly, and highlevel features of face alignment are fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment features and global features for AU detection. Experiments on BP4D and DISFA benchmarks demonstrate that our framework significantly outperforms the state-of-the-art methods for AU detection.
Attention mechanism has recently attracted increasing attentions in the area of facial action unit (AU) detection. By finding the region of interest (ROI) of each AU with the attention mechanism, AU related local features can be captured. Most existing attention based AU detection works use prior knowledge to generate fixed attentions or refine the predefined attentions within a small range, which limits their capacity to model various AUs. In this paper, we propose a novel end-to-end weakly-supervised attention and relation learning framework for AU detection with only AU labels, which has not been explored before. In particular, multi-scale features shared by each AU are learned firstly, and then both channel-wise attentions and spatial attentions are learned to select and extract AU related local features. Moreover, pixellevel relations for AUs are further captured to refine spatial attentions so as to extract more relevant local features. Extensive experiments on BP4D and DISFA benchmarks demonstrate that our framework (i) outperforms the state-of-the-art methods for AU detection, and (ii) can find the ROI of each AU and capture the relations among AUs adaptively.
In multi-label learning, each sample can be assigned to multiple class labels simultaneously. In this work, we focus on the problem of multi-label learning with missing labels (MLML), where instead of assuming a complete label assignment is provided for each sample, only partial labels are assigned with values, while the rest are missing or not provided. The positive (presence), negative (absence) and missing labels are explicitly distinguished in MLML. We formulate MLML as a transductive learning problem, where the goal is to recover the full label assignment for each sample by enforcing consistency with available label assignments and smoothness of label assignments. Along with an exact solution, we also provide an effective and efficient approximated solution. Our method shows much better performance than several state-of-the-art methods on several benchmark data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.