We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoderdecoder image captioning framework for more human-like captions. Intuitively, we humans use the inductive bias to compose collocations and contextual inference in discourse.
Most of current image captioning models heavily rely on paired image-caption datasets. However, getting large scale image-caption paired data is labor-intensive and time-consuming. In this paper, we present a scene graphbased approach for unpaired image captioning. Our framework comprises an image scene graph generator, a sentence scene graph generator, a scene graph encoder, and a sentence decoder. Specifically, we first train the scene graph encoder and the sentence decoder on the text modality. To align the scene graphs between images and sentences, we propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin.
In China the family is still the major welfare provider for old people in rural areas. Although the implementation of this role has varied significantly, in different historical periods, owing to social and economic changes in the rural environment, the core functions of the family have remained the same, that is, the provision of welfare for dependants, particularly for the aged. In the more traditional China, providing care for the aged was indeed assumed to be a paramount function of the family. Whereas, following the founding of the PRC in , the welfare function of the family was reduced, as a result of the collectivization of the rural economy, which meant a part of family responsibilities being shared by collective organizations. However, after more than twenty years' experience of agricultural collectivization, China embarked on a course of further rural economic reform in the early s, replacing the commune system with one of private production based on the family unit. As a result, rural welfare responsibilities were shifted back from the commune to the family, which became solely responsible for providing support for its dependent members. This paper attempts to set out the real situation with regard to family support for rural old people in China. The first section offers a brief introduction to the declining family status of rural old people as a consequence of socio-economic change. The second section reviews the implications of rural economic reform for the (declining) status of old people with regard to family support, focusing on patterns of rural old age dependency and the changing roles of family caregivers. Lastly, cases of family support disputes and community responses are presented, drawing on findings from fieldwork conducted by the author between and in three rural localities in China. In what was reputed a Confucian social welfare order (Leung and Nann ), the traditional family of China 1 appeared the perfect model welfare institution for the aged. It was an institution supported by a political culture based on Confucian ideologies and reinforced by imperial laws that S P A 0144-5596 V. 35, N. 3, J 2001, . 307-320
Despite the great progress in human motion prediction, it remains a challenging task due to the complicated structural dynamics of human behaviors. In this paper, we address this problem in three aspects. First, to capture the long-range spatial correlations and temporal dependencies, we apply a transformer-based architecture with the global attention mechanism. Specifically, we feed the network with the sequential joints encoded with the temporal information for spatial and temporal explorations. Second, to further exploit the inherent kinematic chains for better 3D structures, we apply a progressive-decoding strategy, which performs in a central-to-peripheral extension according to the structural connectivity. Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions. We evaluate the proposed method on two challenging benchmark datasets (Human3.6M and CMU-Mocap). Experimental results show our superior performance compared with the state-of-the-art approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.