In order to reduce the negative impact of severe occlusion in dense scenes on the performance degradation of the tracker, considering that the head is the highest and least occluded part of the pedestrian’s entire body, we propose a new multiobject tracking method for pedestrians in dense crowds combined with head tracking. For each frame of the video, a head tracker is first used to generate the pedestrians’ head movement tracklets, and the pedestrians’ whole body bounding boxes are detected at the same time. Secondly, the degree of association between the head bounding boxes and the whole body bounding boxes are calculated, and the Hungarian algorithm is used to match the above calculation results. Finally, according to the matching results, the head bounding boxes in the head tracklets are replaced with the whole body bounding boxes, and the whole body motion tracklets of the pedestrians in the dense scene are generated. Our method can be performed online, and experiments suggested that our method effectively reduces the negative effects of false negatives and false positives on the tracker caused by severe occlusion in dense scenes.
In recent years, with the rapid development of computer vision technology and the popularization of intelligent hardware, as well as people’s increasing demand for intelligent products for human-computer interaction, visual grounding technology can help machines and humans identify and locate objects, thereby promoting human-computer interaction and intelligent manufacturing. At the same time, human-computer interaction is constantly evolving and improving, becoming increasingly intelligent, humane, and efficient. This paper proposes a new VG model and designs a language verification module that uses language information as the main information to increase the model’s interactivity. Additionally, we propose the combination of visual grounding and human-computer interaction, aiming to explore the research status and development trends of visual grounding and human-computer interaction technology, as well as their application in practical scenarios and optimization directions, to provide references and guidance for relevant researchers and promote the development and application of visual grounding and human-computer interaction technology.
Face recognition is a widely used scene of artificial intelligence technology. However, some face occlusions cause the face to be unable to be effectively detected in a specific environment. Although many algorithms have been proposed to solve this problem, in essence, a large number of face image data containing occlusion elements is needed to train to improve the detection ability of the algorithm. In recent years, this problem can be effectively solved by using the image generation ability of generative adversarial network. This paper proposes an improved Generative Adversarial Networks (GAN), which improves the effect of occluded face image generation by adding coding module. Through the expansion of data set, the detection accuracy of several classic face detection models for occluded faces is improved by more than 3%. At the moment when the epidemic has not been over, occlusion face data is of great significance to improve the performance of face detection systems in specific public places such as customs security inspection and medical centers.
In recent years, with the rapid development of computer vision technology and the popularity of intelligent hardware, as well as the increasing demand for human–machine interaction in intelligent products, visual localization technology can help machines and humans to recognize and locate objects, thereby promoting human–machine interaction and intelligent manufacturing. At the same time, human–machine interaction is constantly evolving and improving, becoming increasingly intelligent, humanized, and efficient. In this article, a new visual localization model is proposed, and a language validation module is designed to use language information as the main information to increase the model’s interactivity. In addition, we also list the future possibilities of visual localization and provide two examples to explore the application and optimization direction of visual localization and human–machine interaction technology in practical scenarios, providing reference and guidance for relevant researchers and promoting the development and application of visual localization and human–machine interaction technology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.