Image-text retrieval is a fundamental cross-modal task whose main idea is to learn image-text matching. Generally, according to whether there exist interactions during the retrieval process, existing image-text retrieval methods can be classified into independent representation matching methods and cross-interaction matching methods. The independent representation matching methods generate the embeddings of images and sentences independently and thus are convenient for retrieval with hand-crafted matching measures (e.g., cosine or Euclidean distance). As to the cross-interaction matching methods, they achieve improvement by introducing the interaction-based networks for inter-relation reasoning, yet suffer the low retrieval efficiency. This article aims to develop a method that takes the advantages of cross-modal inter-relation reasoning of cross-interaction methods while being as efficient as the independent methods. To this end, we propose a graph-based Cross-modal Graph Matching Network (CGMN) , which explores both intra- and inter-relations without introducing network interaction. In CGMN, graphs are used for both visual and textual representation to achieve intra-relation reasoning across regions and words, respectively. Furthermore, we propose a novel graph node matching loss to learn fine-grained cross-modal correspondence and to achieve inter-relation reasoning. Experiments on benchmark datasets MS-COCO, Flickr8K, and Flickr30K show that CGMN outperforms state-of-the-art methods in image retrieval. Moreover, CGMM is much more efficient than state-of-the-art methods using interactive matching. The code is available at https://github.com/cyh-sj/CGMN .
Semantic role labeling (SRL) is a task to recognize all the predicate-argument pairs of a sentence, which has been in a performance improvement bottleneck after a series of latest works were presented. This paper proposes a novel syntax-agnostic SRL model enhanced by the proposed associated memory network (AMN), which makes use of inter-sentence attention of label-known associated sentences as a kind of memory to further enhance dependency-based SRL. In detail, we use sentences and their labels from train dataset as an associated memory cue to help label the target sentence. Furthermore, we compare several associated sentences selecting strategies and label merging methods in AMN to find and utilize the label of associated sentences while attending them. By leveraging the attentive memory from known training data, Our full model reaches state-of-theart on CoNLL-2009 benchmark datasets for syntax-agnostic setting, showing a new effective research line of SRL enhancement other than exploiting external resources such as well pre-trained language models.
Indoor pedestrian localization measurement is a hot topic and is widely used in indoor navigation and unmanned devices. PDR (Pedestrian Dead Reckoning) is a low-cost and independent indoor localization method, estimating position of pedestrians independently and continuously. PDR fuses the accelerometer, gyroscope and magnetometer to calculate relative distance from starting point, which is mainly composed of three modules: step detection, stride length estimation and heading calculation. However, PDR is affected by cumulative error and can only work in two-dimensional planes, which makes it limited in practical applications. In this paper, a novel localization method V-PDR is presented, which combines VPR (Visual Place Recognition) and PDR in a loosely coupled way. When there is error between the localization result of PDR and VPR, the algorithm will correct the localization of PDR, which significantly reduces the cumulative error. In addition, VPR recognizes scenes on different floors to correct floor localization due to vertical movement, which extends application scene of PDR from two-dimensional planes to three-dimensional spaces. Extensive experiments were conducted in our laboratory building to verify the performance of the proposed method. The results demonstrate that the proposed method outperforms general PDR method in accuracy and can work in three-dimensional space.
Light-weight convolutional neural networks (CNNs) suffer limited feature representation capabilities due to low computational budgets, resulting in degradation in performance. To make CNNs more efficient, dynamic neural networks (DyNet) have been proposed to increase the complexity of the model by using the Squeeze-and-Excitation (SE) module to adaptively obtain the importance of each convolution kernel through the attention mechanism. However, the attention mechanism in the SE network (SENet) selects all channel information for calculations, which brings essential challenges: (a) interference caused by the internal redundant information; and (b) increasing number of network calculations. To address the above problems, this work proposes a dynamic convolutional network (termed as EAM-DyNet) to reduce the number of channels in feature maps by extracting only the useful spatial information. EAM-DyNet first uses the random channel reduction and channel grouping reduction methods to remove the redundancy in the information. As the downsampling of information can lead to the loss of useful information, it then applies an adaptive average pooling method to maintain the information integrity. Extensive experimental results on the baseline demonstrate that EAM-DyNet outperformed the existing approaches, thus it can achieve higher accuracy of the network test and less network parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.