In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting. Traditional video captioning methods, which focus on routinely describing what exists and happens in a video, are not amenable for product-oriented video captioning. To address this problem, we propose a product-oriented video captioner framework, abbreviated as Poet. Poet firstly represents the videos as product-oriented spatial-temporal graphs. Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics. The knowledge leveraging module in Poet differs from the traditional design by performing knowledge filtering and dynamic memory modeling. We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity. Experiments are performed on two product-oriented video captioning datasets, buyer-generated fashion video dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from Mobile Taobao. We will release the desensitized datasets to promote further investigations on both video captioning and general video analysis problems. CCS CONCEPTS• Computing methodologies → Computer vision; Natural language generation.
Non-Abelian anyons are exotic quasiparticle excitations hosted by certain topological phases of matter. They break the fermion-boson dichotomy and obey non-Abelian braiding statistics: their interchanges yield unitary operations, rather than merely a phase factor, in a space spanned by topologically degenerate wavefunctions. They are the building blocks of topological quantum computing. However, experimental observation of non-Abelian anyons and their characterizing braiding statistics is notoriously challenging and has remained elusive hitherto, in spite of various theoretical proposals. Here, we report an experimental quantum digital simulation of projective non-Abelian anyons and their braiding statistics with up to 68 programmable superconducting qubits arranged on a two-dimensional lattice. By implementing the ground states of the toric-code model with twists through quantum circuits, we demonstrate that twists exchange electric and magnetic charges and behave as a particular type of non-Abelian anyons—the Ising anyons. In particular, we show experimentally that these twists follow the fusion rules and non-Abelian braiding statistics of the Ising type, and can be explored to encode topological logical qubits. Furthermore, we demonstrate how to implement both single- and two-qubit logic gates through applying a sequence of elementary Pauli gates on the underlying physical qubits. Our results demonstrate a versatile quantum digital approach for simulating non-Abelian anyons, offering a new lens into the study of such peculiar quasiparticles.
In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume. To recommend these videos to potential consumers more effectively, diverse and catchy video titles are critical. However, consumer-generated videos seldom accompany appropriate titles. To bridge this gap, we integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-toend modeling framework. Although automatic video titling is very useful and demanding, it is much less addressed than video captioning. The latter focuses on generating sentences that describe videos as a whole while our task requires the product-aware multigrained video analysis. To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization. Specifically, the granular-level interaction modeling first utilizes temporal-spatial landmark cues, descriptive words, and abstractive attributes to builds three individual graphs and recognizes the intra-actions in each graph through Graph Neural Networks (GNN). Then the global-local aggregation module is proposed to model inter-actions across graphs and aggregate heterogeneous graphs into a holistic graph representation. The abstraction-level story-line summarization further considers both frame-level video features and the holistic graph to utilize the interactions between products and backgrounds, and generate the story-line topic of the video. We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform, and will make the desensitized version publicly available to nourish further development of the research community 1 . Relatively extensive experiments on various datasets demonstrate the efficacy of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.