Previous studies have shown that visual attention effect can spread to the task‐irrelevant auditory modality automatically through either the stimulus‐driven binding process or the representation‐driven priming process. Using an attentional blink paradigm, the present study investigated whether the long‐latency stimulus‐driven and representation‐driven cross‐modal spread of attention would be inhibited or facilitated when the attentional resources operating at the post‐perceptual stage of processing are inadequate, whereas ensuring all visual stimuli were spatially attended and the representations of visual target object categories were activated, which were previously thought to be the only endogenous prerequisites for triggering cross‐modal spread of attention. The results demonstrated that both types of attentional spreading were completely suppressed during the attentional blink interval but were highly prominent outside the attentional blink interval, with the stimulus‐driven process being independent of, whereas the representation‐driven process being dependent on, audiovisual semantic congruency. These findings provide the first evidence that the occurrences of both stimulus‐driven and representation‐driven spread of attention are contingent on the amount of post‐perceptual attentional resources responsible for the late consolidation processing of visual stimuli, whereas the early detection of visual stimuli and the top‐down activation of the visual representations are not the sole endogenous prerequisites for triggering any types of cross‐modal attentional spreading.
The present study recorded event-related potentials (ERPs) in a visual object-recognition task under the attentional blink paradigm to explore the temporal dynamics of the cross-modal boost on attentional blink and whether this auditory benefit would be modulated by semantic congruency between T2 and the simultaneous sound. Behaviorally, the present study showed that not only a semantically congruent but also a semantically incongruent sound improved T2 discrimination during the attentional blink interval, whereas the enhancement was larger for the congruent sound. The ERP results revealed that the behavioral improvements induced by both the semantically congruent and incongruent sounds were closely associated with an early cross-modal interaction on the occipital N195 (192–228 ms). In contrast, the lower T2 accuracy for the incongruent than congruent condition was accompanied by a larger late occurring cento-parietal N440 (424–448 ms). These findings suggest that the cross-modal boost on attentional blink is hierarchical: the task-irrelevant but simultaneous sound, irrespective of its semantic relevance, firstly enables T2 to escape the attentional blink via cross-modally strengthening the early stage of visual object-recognition processing, whereas the semantic conflict of the sound begins to interfere with visual awareness only at a later stage when the representation of visual object is extracted.
Depression is a severe psychological condition that affects millions of people worldwide. As depression has received more attention in recent years, it has become imperative to develop automatic methods for detecting depression. Although numerous machine learning methods have been proposed for estimating the levels of depression via audio, visual, and audiovisual emotion sensing, several challenges still exist. For example, it is difficult to extract long-term temporal context information from long sequences of audio and visual data, and it is also difficult to select and fuse useful multi-modal information or features effectively. In addition, how to include other information or tasks to enhance the estimation accuracy is also one of the challenges. In this study, we propose a multi-modal adaptive fusion transformer network for estimating the levels of depression. Transformer-based models have achieved state-of-the-art performance in language understanding and sequence modeling. Thus, the proposed transformer-based network is utilized to extract long-term temporal context information from uni-modal audio and visual data in our work. This is the first transformer-based approach for depression detection. We also propose an adaptive fusion method for adaptively fusing useful multi-modal features. Furthermore, inspired by current multi-task learning work, we also incorporate an auxiliary task (depression classification) to enhance the main task of depression level regression (estimation). The effectiveness of the proposed method has been validated on a public dataset (AVEC 2019 Detecting Depression with AI Sub-challenge) in terms of the PHQ-8 scores. Experimental results indicate that the proposed method achieves better performance compared with currently state-of-the-art methods. Our proposed method achieves a concordance correlation coefficient (CCC) of 0.733 on AVEC 2019 which is 6.2% higher than the accuracy (CCC = 0.696) of the state-of-the-art method.
KANSEI is a Japanese term which means psychological feeling or image of a product. KANSEI engineering refers to the translation of consumers' psychological feeling about a product into perceptual design elements. Recently KANSEI based image indexing or image retrieval have been done by using interactive genetic algorithms (IGA). In this paper, we propose a new technique for clothing fabric image retrieval based on KANSEI (impressions). We first learn the mapping function between the fabric image features and the KANSEI and then the images in the database are projected into the KANSEI space (psychological space). The retrieval is done in the psychological space by comparing the query impression with the projection of the images in database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.