Fine-grained image recognition is central to many multimedia tasks such as search, retrieval and captioning. Unfortunately, these tasks are still challenging since the appearance of samples of the same class can be more different than those from different classes. This issue is mainly due to changes in deformation, pose, and the presence of clutter. In the literature, attention has been one of the most successful strategies to handle the aforementioned problems. Attention has been typically implemented in neural networks by selecting the most informative regions of the image that improve classification. In contrast, in this paper, attention is not applied at the image level but to the convolutional feature activations. In essence, with our approach, the neural model learns to attend to lower-level feature activations without requiring part annotations and uses those activations to update and rectify the output likelihood distribution. The proposed mechanism is modular, architecture-independent and efficient in terms of both parameters and computation required. Experiments demonstrate that well-known networks such as Wide Residual Networks and ResNeXt, when augmented with our approach, systematically improve their classification accuracy and become more robust to changes in deformation and pose and to the presence of clutter. As a result, our proposal reaches state-of-the-art classification accuracies in CIFAR-10, the Adience gender recognition task, Stanford Dogs, and UEC-Food100 while obtaining competitive performance in Ima-geNet, CIFAR-100, CUB200 Birds, and Stanford Cars. In addition, we analyze the different components of our model, showing that the proposed attention modules succeed in finding the most discriminative regions of the image. Finally, as a proof of concept, we demonstrate that with only local predictions, an augmented neural network can successfully classify an image before reaching any fully connected layer, thus reducing the computational amount up to 10%.
BACKGROUND Eating disorders are psychological conditions characterized by unhealthy eating habits. Anorexia Nervosa (AN) is defined by the thought of being overweight despite being dangerously underweight. Psychological signs involve emotional and behavioral issues. There is evidence that signs and symptoms can be manifested on social media, where both harmful and beneficial content is shared daily. OBJECTIVE The aim of this work is to characterize Spanish speaking users with Anorexia signs on Twitter through the extraction and inference of behavioral, demographical, relational, and multi-modal data. This analysis is focused on characterizing and comparing users at different stages of the process to overcome the illness, including treatment and full recovery periods considering the Transtheoretical Model of Health Behavior Change (TTM). METHODS We analyze tweets published by users going through different stages of Anorexia. Users are characterized through their writings, posting patterns, relations, and images. We analyze the differences among users going through each stage of the illness and control users (users not suffering from AN). We also analyze the topics of interest of their followees (users followed by them). We perform a clustering approach to distinguish users at an early phase of the illness (precontemplation) from users that recognize that their behavior is problematic (contemplation); and generate models dedicated to the detection of tweets and images related to AN. We consider two types of control users: focused control users that use terms related to anorexia; and random control users. RESULTS We found significant differences between users at each stage of the recovery process (P<.001) and control groups. Users with AN tend to tweet more at night, with a median sleep period tweeting ratio of 0.05 in comparison to random control users (0.04) and focused control users (0.03). Pictures are relevant for the characterization of users. Focused and random control users are characterized by the usage of text on their profile pictures. We also found a strong polarization between focused control users, and users at the first stages of the disorder. There was a strong correlation (Spearman’s coefficient) among the shared interest between users with AN and their followees (0.96). Also, the interests of recovered users and users in treatment were more highly correlated to those corresponding to the focused control group (0.87 for both) in comparison to AN’s users (0.67), suggesting a shift on users’ interest during the recovery process. CONCLUSIONS We have mapped signs of Anorexia Nervosa to the Social media context. These results enforce the findings of related work on other languages and involve a deep analysis on the topics of interest of users at each phase of the disorder. The features and patterns identified provide a basis for the development of detection tools and recommender systems.
Social networks have attracted the attention of psychologists, as the behavior of users can be used to assess personality traits, and to detect sentiments and critical mental situations such as depression or suicidal tendencies. Recently, the increasing amount of image uploads to social networks has shifted the focus from text to image-based personality assessment. However, obtaining the ground-truth requires giving personality questionnaires to the users, making the process very costly and slow, and hindering research on large populations. In this paper, we demonstrate that it is possible to predict which images are most associated with each personality trait of the OCEAN personality model, without requiring ground-truth personality labels. Namely, we present a weakly supervised framework which shows that the personality scores obtained using specific images textually associated with particular personality traits are highly correlated with scores obtained using standard text-based personality questionnaires. We trained an OCEAN trait model based on Convolutional Neural Networks (CNNs), learned from 120K pictures posted with specific textual hashtags, to infer whether the personality scores from the images uploaded by users are consistent with those scores obtained from text. In order to validate our claims, we performed a personality test on a heterogeneous group of 280 human subjects, showing that our model successfully predicts which kind of image will match a person with a given level of a trait. Looking at the results, we obtained evidence that personality is not only correlated with text, but with image content too. Interestingly, different visual patterns emerged from those images most liked by persons with a particular personality trait: for instance, pictures most associated with high conscientiousness usually contained healthy food, while low conscientiousness pictures contained injuries, guns, and alcohol. These findings could pave the way to complement text-based personality questionnaires with image-based questions.
In recent years, top referred methods on object detection like R-CNN have implemented this task as a combination of proposal region generation and supervised classification on the proposed bounding boxes. Although this pipeline has allowed to achieve state-of-the-art results in multiple datasets, it has inherent limitations that has converted object detection into a very complex and inefficient task in computational terms. In contrast to considering this standard strategy, in this paper we enhance Detection Transformers (DETR) which originally implements object detection as a set-prediction problem directly in an end-to-end fully differentiable pipeline without requiring priors. In particular, we incorporate Feature Pyramids (FP) to the DETR architecture and demonstrate the effectiveness of the resulting DETR-FP approach on improving logo detection results, in essence thanks to the correct detection of small logos. So, without requiring any domain specific prior to be fed to the model, DETR-FP obtains competitive results on the OpenLogo and MS-COCO datasets, when compared to a Faster R-CNN baseline which strongly depends on hand-designed priors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.