Scene Graph Generation (SGG) has achieved significant progress recently. However, most previous works rely heavily on fixed-size entity representations based on bounding box proposals, anchors, or learnable queries. As each representation's cardinality has different trade-offs between performance and computation overhead, extracting highly representative features efficiently and dynamically is both challenging and crucial for SGG. In this work, a novel architecture called RepSGG is proposed to address the aforementioned challenges, formulating a subject as queries, an object as keys, and their relationship as the maximum attention weight between pairwise queries and keys. With more fine-grained and flexible representation power for entities and relationships, RepSGG learns to sample semantically discriminative and representative points for relationship inference. Moreover, the long-tailed distribution also poses a significant challenge for generalization of SGG. A run-time performance-guided logit adjustment (PGLA) strategy is proposed such that the relationship logits are modified via affine transformations based on run-time performance during training. This strategy encourages a more balanced performance between dominant and rare classes. Experimental results show that RepSGG achieves the state-of-the-art or comparable performance on the Visual Genome and Open Images V6 datasets with fast inference speed, demonstrating the efficacy and efficiency of the proposed methods.
Semantically-aligned (speech, image) datasets can be used to explore "visually-grounded speech". In a majority of existing investigations, features of an image signal are extracted using neural networks "pre-trained" on other tasks (e.g., classification on ImageNet). In still others, pre-trained networks are used to extract audio features prior to semantic embedding. Without "transfer learning" through pre-trained initialization or pretrained feature extraction, previous results have tended to show low rates of recall in speech → image and image → speech queries.Choosing appropriate neural architectures for encoders in the speech and image branches and using large datasets, one can obtain competitive recall rates without any reliance on any pretrained initialization or feature extraction: (speech, image) semantic alignment and speech → image and image → speech retrieval are canonical tasks worthy of independent investigation of their own and allow one to explore other questions-e.g., the size of the audio embedder can be reduced significantly with little loss of recall rates in speech → image and image → speech queries.
We present a partial panoramic view of possible contexts and applications of the fractional calculus. In this context, we show some different applications of fractional calculus to different models in ordinary differential equation (ODE) and partial differential equation (PDE) formulations ranging from the basic equations of mechanics to diffusion and Dirac equations.
Two-mode charge (pair) coherent states has been introduced previously by using η| representation. In the present paper we reobtain these states by a rather different method. Then, using the nonlinear coherent states approach and based on a simple manner by which the representation of two-mode charge coherent states is introduced, we generalize the bosonic creation and annihilation operators to the fdeformed ladder operators and construct a new class of f -deformed charge coherent states. Unlike the (linear) pair coherent states, our presented structure has the potentiality to generate a large class of pair coherent states with various nonclassicality signs and physical properties which are of interest. Along this purpose, we use a few well-known nonlinearity functions associated with particular quantum systems as some physical appearances of our presented formalism. After introducing the explicit form of the above correlated states in two-mode Fock-space, several nonclassicality features of the corresponding states (as well as the two-mode linear charge coherent states) are numerically investigated by calculating quadrature squeezing, Mandel parameter, second-order correlation function, second-order correlation function between the two modes and Cauchy-Schwartz inequality. Also, the oscillatory behaviour of the photon count and the quasi-probability (Husimi) function of the associated states will be discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.