Continuous positioning and tracking of multi-pedestrian targets is a common concern for large indoor space security, emergency evacuation, location services, and other application areas. Among the sensors used for positioning, the ultra-wide band (UWB) is a critical way to achieve high-precision indoor positioning. However, due to the existence of indoor Non-Line-of-Sight (NLOS) error, a single positioning system can no longer meet the requirement for positioning accuracy. This research aimed to design a high-precision and stable fusion positioning system which is based on the UWB and vision. The method uses the Hungarian algorithm to match the identity of the UWB and vision localization results, and, after successful matching, the fusion localization is performed by the federated Kalman filtering algorithm. In addition, due to the presence of colored noise in indoor positioning data, this paper also proposes a Kalman filtering algorithm based on principal component analysis (PCA). The advantage of this new filtering algorithm is that it does not have to establish the dynamics model of the distribution hypothesis and requires less calculation. The PCA algorithm is firstly used to minimize the correlation of the observables, thus providing a more reasonable Kalman gain by energy estimation and the denoised data, which are substituted into Kalman prediction equations. Experimental results show that the average accuracy of the UWB and visual fusion method is 25.3% higher than that of the UWB. The proposed method can effectively suppress the influence of NLOS error on the positioning accuracy because of the high stability and continuity of visual positioning. Furthermore, compared with the traditional Kalman filtering, the mean square error of the new filtering algorithm is reduced by 31.8%. After using the PCA-Kalman filtering, the colored noise is reduced and the Kalman gain becomes more reasonable, facilitating accurate estimation of the state by the filter.
The alignment of information between the image and the question is of great significance in the visual question answering (VQA) task. Self-attention is commonly used to generate attention weights between image and question. These attention weights can align two modalities. Through the attention weight, the model can select the relevant area of the image to align with the question. However, when using the self-attention mechanism, the attention weight between two objects is only determined by the representation of these two objects. It ignores the influence of other objects around these two objects. This contribution proposes a novel multi-hop attention alignment method that enriches surrounding information when using self-attention to align two modalities. Simultaneously, in order to utilize position information in alignment, we also propose a position embedding mechanism. The position embedding mechanism extracts the position information of each object and implements the position embedding mechanism to align the question word with the correct position in the image. According to the experiment on the VQA2.0 dataset, our model achieves validation accuracy of 65.77%, outperforming several state-of-the-art methods. The experimental result shows that our proposed methods have better performance and effectiveness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.