Human skeleton contains significant information about actions, therefore, it is quite intuitive to incorporate skeletons in human action recognition. Human skeleton resembles to a graph where body joints and bones mimic to graph nodes and edges. This resemblance of human skeleton to graph structure is the main motivation to apply graph convolutional neural network for human action recognition. Results show that the discriminant contribution of different joints is not equal for different actions. Therefore, we propose to use attention-joints that correspond to joints significantly contributing to the specific actions. Features corresponding to only these attention-joints are computed and assigned as node features of the graph. In our method, node features (also termed as attention-joint features) include the i) distances of attention-joints from the center-of-gravity of human body, ii) distances between adjacent attention-joints and iii) joints flow features. The proposed method gives a simple but more efficient representation of skeleton sequences by concatenating more relative distances and relative coordinates to other joints. The proposed methodology has been evaluated on single image Stanford 40-Actions dataset, as well as on temporal skeleton-based action recognition PKU-MDD and NTU-RGBD datasets. Results show that this framework outperforms existing state-of-the-art methods. INDEX TERMS Human action recognition, attention-joints, graph convolutional neural network.
Visual Information Extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection and recognition) and information extraction, which completely ignored the high correlation among them during optimization. In this paper, we propose a robust Visual Information Extraction System (VIES) towards real-world scenarios, which is an unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction by taking a single document image as input and outputting the structured information. Specifically, the information extraction branch collects abundant visual and semantic representations from text spotting for multimodal feature fusion and conversely, provides higher-level semantic clues to contribute to the optimization of text spotting. Moreover, regarding the shortage of public benchmarks, we construct a fully-annotated dataset called EPHOIE (https://github.com/HCIILAB/EPHOIE), which is the first Chinese benchmark for both text spotting and visual information extraction. EPHOIE consists of 1,494 images of examination paper head with complex layouts and background, including a total of 15,771 Chinese handwritten or printed text instances. Compared with the state-of-the-art methods, our VIES shows significant superior performance on the EPHOIE dataset and achieves a 9.01% F-score gain on the widely used SROIE dataset under the end-to-end scenario.
Recently, deep learning has greatly promoted the performance of license plate recognition (LPR) by learning robust features from numerous labeled data. However, the large variation of wild license plates across complicated environments and perspectives is still a huge challenge to the robust LPR. To solve the problem, we propose an effective and efficient shared adversarial training network (SATN) in this paper, which can learn the environment-independent and perspective-free semantic features from wild license plates with the prior knowledge of standard stencil-rendered license plates, as standard stencil-rendered license plates are independent of complicated environments and various perspectives. Besides, to correct the features of heavily perspective distorted license plates perfectly, we further propose a novel dual attention transformation (DAT) module in the shared adversarial training network. Comprehensive experiments on AOLP-RP and CCPD benchmarks show that the proposed method outperforms state-of-the-art methods by a large margin on the LPR task. INDEX TERMS Deep learning, license plate recognition (LPR), dual attention transformation (DAT), shared adversarial training network (SATN).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.