We ask whether the neural network interpretation methods can be fooled via adversarial model manipulation, which is defined as a model fine-tuning step that aims to radically alter the explanations without hurting the accuracy of the original models, e.g., VGG19, ResNet50, and DenseNet121. By incorporating the interpretation results directly in the penalty term of the objective function for finetuning, we show that the state-of-the-art saliency map based interpreters, e.g., LRP, Grad-CAM, and SimpleGrad, can be easily fooled with our model manipulation. We propose two types of fooling, Passive and Active, and demonstrate such foolings generalize well to the entire validation set as well as transfer to other interpretation methods. Our results are validated by both visually showing the fooled explanations and reporting quantitative metrics that measure the deviations from the original explanations. We claim that the stability of neural network interpretation method with respect to our adversarial model manipulation is an important criterion to check for developing robust and reliable neural network interpretation method.
Although recent studies have shown the importance of control in creative problem solving, the neural mechanisms of control processes engaged in retrieval of weak representations, which is closely linked to creative problem solving, remain unclear. The current study aimed to examine the neural mechanisms associated with retrieval of weak representations using functional magnetic resonance imaging and their potential relationships with creativity task performance. For this purpose, participants performed an experimental task that enabled us to directly compare between retrieval of previously unattended-and-weak representations and attended-and-strong representations. Imaging results indicated that the right anterior dorsolateral prefrontal cortex (aDLPFC) was selectively engaged in retrieval of weak representations. Moreover, the right aDLPFC activations were positively correlated with individuals’ creativity task performance but independent of attention-demanding task performance. We therefore suggest that the right aDLPFC plays a key role in retrieval of weak representations and may support creative problem solving.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.