Situational awareness by Unmanned Aerial Vehicles (UAVs) is important for many applications such as surveillance, search and rescue, and disaster response. In those applications, detecting and locating people and recognizing their actions in near real-time can play a crucial role for preparing an effective response. However, there are currently three main limitations to perform this task efficiently. First, it is currently often not possible to access the live video feed from a UAV's camera due to limited bandwidth. Second, even if the video feed is available, monitoring and analyzing video over prolonged time is a tedious task for humans. Third, it is typically not possible to locate random people via their cellphones. Therefore, we developed the Person-Action-Locator (PAL), a novel UAV-based situational awareness system. The PAL system addresses the first issue by analyzing the video feed onboard the UAV, powered by a supercomputeron-a-module. Specifically, as a support for human operators, the PAL system relies on Deep Learning models to automatically detect people and recognize their actions in near real-time. To address the third issue, we developed a Pixel2GPS converter that estimates the location of people from the video feed. The resulticons representing detected people labeled by their actions -is visualized on the map interface of the PAL system. The Deep Learning models were first tested in the lab and demonstrated promising results. The fully integrated PAL system was successfully tested in the field. We also performed another collection of surveillance data to complement the lab results.
We present an end-to-end trainable Neural Network architecture for stereo imaging that jointly locates and estimates human body poses in 3D. Our method defines a 2D pose for each human in a stereo pair of images and uses a correlation layer with a composite field to associate each left-right pair of joints. In absence of a stereo pose dataset, we show that we can train our method with synthetic data only and test it on real-world images (i.e., our training stage is domain invariant). Our method is particularly suitable for autonomous vehicles. We achieve state-of-the-art results for the 3D localization task on the challenging real-world KITTI dataset while running four times faster.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.