Sound event detection is to infer the event by understanding the surrounding environmental sounds. Due to the scarcity of rare sound events, it becomes challenging for the well-trained detectors which have learned too much prior knowledge. Meanwhile, few-shot learning methods promise a good generalization ability when facing a new limited-data task. Recent approaches have achieved promising results in this field. However, these approaches treat each support example independently, ignoring the information of other examples from the whole task. Because of this, most of previous methods are constrained to generate a same feature embedding for all test-time tasks, which is not adaptive to each inputted data. In this work, we propose a novel task-adaptive module which is easy to plant into any metric-based few-shot learning frameworks. The module could identify the task-relevant feature dimension. Incorporating our module improves the performance considerably on two datasets over baseline methods, especially for the transductive propagation network. Such as +6.8% for 5-way 1-shot accuracy on ESC-50, and +5.9% on noiseESC-50. We investigate our approach in the domain-mismatch setting and also achieve better results than previous methods.
Social relationship understanding infers existing social relationships among individuals in a given scenario, which has been demonstrated to have a wide range of practical value in reality. However, existing methods infer the social relationship of each person pair in isolation, without considering the context-aware information for person pairs in the same scenario. The context-aware information for person pairs exists extensively in reality, that is, the social relationships of different person pairs in a simple scenario are always related to each other. For instance, if most of the person pairs in a simple scenario have the same social relationship, "friends", then the other pairs have a high probability of being "friends" or other similar coarse-level relationships, such as "intimate". This context-aware information should thus be considered in social relationship understanding. Therefore, this paper proposes a novel end-to-end trainable Person-Pair Relation Network (PPRN), which is a GRU-based graph inference network, to first extract the visual and position information as the person-pair feature information, then enable it to transfer on a fully-connected social graph, and finally utilizes different aggregators to collect different kinds of person-pair information. Unlike existing methods, the method-with its message passing mechanism in the graph model-can infer the social relationship of each person-pair in a joint way (i.e., not in isolation). Extensive experiments on People In Social Context (PISC)-and People In Photo Album (PIPA)-relation datasets show the superiority of our method compared to other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.