A first step towards a n understanding of the semantic content in a video is the reliable detection and recognition of actions performed by objects. This is a dificult problem due t o the enormous vaeability in a n action's appearance when seen from different viewpoints and/or at different times. In this paper we address the recognition of actions by taking a novel approach that models actions as special types of 3d objects. Specifically, we observe that any action can be represented as a generalized cylinder, called the action cylinder. Reliable recognition is achieved by recovering the viewpoint transformation between reference (model) and given action cylinders. A set of 8 corresponding points from time-wise corresponding cross-sections is shown t o be suficient t o align the two cylinders under perspective projection. A surprising conclusion from visualizing actions as objects i s that rigid, articulated, and nonrigid actions can all be modeled an a uniform framework.
Effective strategies to control COVID-19 pandemic need high attention to mitigate negatively impacted communal health and global economy, with the brim-full horizon yet to unfold. In the absence of effective antiviral and limited medical resources, many measures are recommended by WHO to control the infection rate and avoid exhausting the limited medical resources. Wearing mask is among the non-pharmaceutical intervention measures that can be used as barrier to primary route of SARS-CoV2 droplets expelled by presymptomatic or asymptomatic individuals. Regardless of discourse on medical resources and diversities in masks, all countries are mandating coverings over nose and mouth in public areas. Towards contribution of public health, the aim of the paper is to devise a real-time technique that can efficiently detect non mask faces in public and thus enforce to wear mask. The proposed technique is ensemble of one stage and two stage detectors to achieve low inference time and high accuracy. We took ResNet50 as a baseline model and applied the concept of transfer learning to fuse high level semantic information in multiple feature maps. In addition, we also propose a bounding box transformation to improve localization performance during mask detection. The experiments are conducted with three popular baseline models namely ResNet50, AlexNet and MobileNet. We explored the possibility of these models to plug-in with the proposed model, so that highly accurate results can be achieved in less inference time. It is observed that the proposed technique can achieve high accuracy (98.2%) when implemented with ResNet50. Besides, the proposed model can generate 11.07% and 6.44% higher precision and recall respectively in mask detection when compared to RetinaFaceMask detector.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.