The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a presegmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-ofthe-art performance on ActivityNet'18 DenseCaption dataset (Krishna et al. 2017) and Charades-STA dataset (Sigurdsson et al. 2016;Gao et al. 2017) while observing only 10 or less clips per video.
Abstract-Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources, such as erroneous data entry, and find similarity matches. In this paper, we study the graph similarity join problem that returns pairs of graphs such that their edit distances are no larger than a threshold. Inspired by the -gram idea for string similarity problem, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. An efficient algorithm is proposed to exploit both matching and mismatching features to improve the filtering and verification on candidates. We demonstrate the proposed algorithm significantly outperforms existing approaches with extensive experiments on publicly available datasets.
Learning an ideal metric is crucial to many tasks in computer vision. Diverse feature representations may combat this problem from different aspects; as visual data objects described by multiple features can be decomposed into multiple views, thus often provide complementary information. In this paper, we propose a cross-view fusion algorithm that leads to a similarity metric for multiview data by systematically fusing multiple similarity measures. Unlike existing paradigms, we focus on learning distance measure by exploiting a graph structure of data samples, where an input similarity matrix can be improved through a propagation of graph random walk. In particular, we construct multiple graphs with each one corresponding to an individual view, and a cross-view fusion approach based on graph random walk is presented to derive an optimal distance measure by fusing multiple metrics. Our method is scalable to a large amount of data by enforcing sparsity through an anchor graph representation. To adaptively control the effects of different views, we dynamically learn view-specific coefficients, which are leveraged into graph random walk to balance multiviews. However, such a strategy may lead to an over-smooth similarity metric where affinities between dissimilar samples may be enlarged by excessively conducting cross-view fusion. Thus, we figure out a heuristic approach to controlling the iteration number in the fusion process in order to avoid over smoothness. Extensive experiments conducted on real-world data sets validate the effectiveness and efficiency of our approach.
Entity alignment (EA) identifies entities that refer to the same real-world object but locate in different knowledge graphs (KGs), and has been harnessed for KG construction and integration. When generating EA results, current embeddingbased solutions treat entities independently and fail to take into account the interdependence between entities. In addition, most of embedding-based EA methods either fuse different features on representation-level and generate unified entity embedding for alignment, which potentially causes information loss, or aggregate features on outcome-level with hand-tuned weights, which is not practical with increasing numbers of features.To tackle these deficiencies, we propose a collective embeddingbased EA framework with adaptive feature fusion mechanism. We first employ three representative features, i.e., structural, semantic and string signals, for capturing different aspects of the similarity between entities in heterogeneous KGs. These features are then integrated at outcome-level, with dynamically assigned weights generated by our carefully devised adaptive feature fusion strategy. Eventually, in order to make collective EA decisions, we formulate EA as the classical stable matching problem between entities to be aligned, with preference lists constructed using fused feature matrix. It is further effectively solved by deferred acceptance algorithm. Our proposal is evaluated on both cross-lingual and mono-lingual EA benchmarks against state-ofthe-art solutions, and the empirical results verify its effectiveness and superiority. We also perform ablation study to gain insights into framework modules.
A systematic investigation of the nanoparticle-enhanced light trapping in thin-film silicon solar cells is reported. The nanoparticles are fabricated by annealing a thin Ag film on the cell surface. An optimisation roadmap for the plasmonenhanced light-trapping scheme for self-assembled Ag metal nanoparticles is presented, including a comparison of rearlocated and front-located nanoparticles, an optimisation of the precursor Ag film thickness, an investigation on different conditions of the nanoparticle dielectric environment and a combination of nanoparticles with other supplementary backsurface reflectors. Significant photocurrent enhancements have been achieved because of high scattering and coupling efficiency of the Ag nanoparticles into the silicon device. For the optimum light-trapping scheme, a short-circuit current enhancement of 27% due to Ag nanoparticles is achieved, increasing to 44% for a "nanoparticle/magnesium fluoride/ diffuse paint" back-surface reflector structure. This is 6% higher compared with our previously reported plasmonic shortcircuit current enhancement of 38%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.