We present a new large-scale multilingual video description dataset, VATEX 1 , which contains over 41, 250 videos and 825, 000 captions in both English and Chinese. Among the captions, there are over 206, 000 English-Chinese parallel translation pairs. Compared to the widely-used MSR-VTT dataset [66], VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. We also introduce two tasks for video-and-language research based on VATEX:(1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context. Extensive experiments on the VATEX dataset show that, first, the unified multilingual model can not only produce both English and Chinese descriptions for a video more efficiently, but also offer improved performance over the monolingual models. Furthermore, we demonstrate that the spatiotemporal video context can be effectively utilized to align source and target languages and thus assist machine translation. In the end, we discuss the potentials of using VATEX for other video-and-language research. * Equal contribution. 1 VATEX stands for Video And TEXt, where X also represents various languages.
Video captioning is the task of automatically generating a textual description of the actions in a video. Although previous work (e.g. sequence-to-sequence model) has shown promising results in abstracting a coarse description of a short video, it is still very challenging to caption a video containing multiple fine-grained actions with a detailed description. This paper aims to address the challenge by proposing a novel hierarchical reinforcement learning framework for video captioning, where a highlevel Manager module learns to design sub-goals and a low-level Worker module recognizes the primitive actions to fulfill the sub-goal. With this compositional framework to reinforce video captioning at different levels, our approach significantly outperforms all the baseline methods on a newly introduced large-scale dataset for fine-grained video captioning. Furthermore, our non-ensemble model has already achieved the state-of-the-art results on the widelyused MSR-VTT dataset. Caption: A person sits on a bed and puts a laptop into a bag.The person stands up, puts the bag on one shoulder, and walks out of the room. Caption #1: A woman offers her dog some food.Caption #2: A woman is eating and sharing food with her dog. Caption #3: A woman is sharing a snack with a dog.
Existing entity typing systems usually exploit the type hierarchy provided by knowledge base (KB) schema to model label correlations and thus improve the overall performance. Such techniques, however, are not directly applicable to more open and practical scenarios where the type set is not restricted by KB schema and includes a vast number of free-form types. To model the underlying label correlations without access to manually annotated label structures, we introduce a novel label-relational inductive bias, represented by a graph propagation layer that effectively encodes both global label co-occurrence statistics and word-level similarities. On a large dataset with over 10,000 free-form types, the graph-enhanced model equipped with an attention-based matching module is able to achieve a much higher recall score while maintaining a high-level precision. Specifically, it achieves a 15.3% relative F1 improvement and also less inconsistency in the outputs. We further show that a simple modification of our proposed graph layer can also improve the performance on a conventional and widely-tested dataset that only includes KB-schema types. 1
The sequential order of utterances is often meaningful in coherent dialogues, and the order changes of utterances could lead to lowquality and incoherent conversations. We consider the order information as a crucial supervised signal for dialogue learning, which, however, has been neglected by many previous dialogue systems. Therefore, in this paper, we introduce a self-supervised learning task, inconsistent order detection, to explicitly capture the flow of conversation in dialogues. Given a sampled utterance pair triple, the task is to predict whether it is ordered or misordered. Then we propose a samplingbased self-supervised network SSN to perform the prediction with sampled triple references from previous dialogue history. Furthermore, we design a joint learning framework where SSN can guide the dialogue systems towards more coherent and relevant dialogue learning through adversarial training. We demonstrate that the proposed methods can be applied to both open-domain and taskoriented dialogue scenarios, and achieve the new state-of-the-art performance on the Open-Subtitiles and Movie-Ticket Booking datasets.
Puerarin was shown to exert anti-oxidative and anti-ferroptosis effects in multiple diseases. The goal of this study was to explore the neuroprotective effect of puerarin on early brain injury (EBI) after subarachnoid hemorrhage (SAH) in rats. A total of 177 adult male Sprague Dawley rats were used. SAH was included via endovascular perforation. Intranasal puerarin or intracerebroventricular dorsomorphin (AMPK inhibitor) and SR18292 (PGC1α inhibitor) were administered. The protein levels of pAMPK, PGC1α, Nrf2, 4HNE, HO1, MDA, ACSL4, GSSG, and iron concentration in the ipsilateral hemisphere were significantly increased, whereas SOD, GPX4, and GSH were decreased at 24 h after SAH. Moreover, puerarin treatment significantly increased the protein levels of pAMPK, PGC1α, Nrf2, HO1, SOD, GPX4, and GSH, but decreased the levels of 4HNE, MDA, ACSL4, GSSG, and iron concentration in the ipsilateral hemisphere at 24 h after SAH. Dorsomorphin or SR18292 partially abolished the beneficial effects of puerarin exerted on neurological dysfunction, oxidative stress injury, and ferroptosis. In conclusion, puerarin improved neurobehavioral impairments and attenuated oxidative-stress-induced brain ferroptosis after SAH in rats. The neuroprotection acted through the activation of the AMPK/PGC1α/Nrf2-signaling pathway. Thus, puerarin may serve as new therapeutics against EBI in SAH patients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.