Deep learning concept and algorithm play a pivotal role in solving various complicated problems such as playing games, forecasting economic future values, detecting objects in images. It could break through the bottle neck in conventional methods of neural networks and artificial intelligence. This paper will compare two influential deep learning algorithms in image processing and object detection, that is, Mask R-CNN and YOLO. Today, detection tasks become more complex when they come to numerous variations in the humans’ perceived appearance, formation, attire, reasoning and the dynamic nature of their behaviour. It is also a challenging task to understand subtle details in their surroundings. For instance, radiance conditions, background clutter and partial or full occlusion. When a machine tries to interact with human or try to take pictures, it becomes hard for them to magnify the details of a human surrounding. In this study we have focused to detect humans effectively. The main objective of the present work is to compare the performance of YOLO and Mask R-CNN, which unveils the inability of Mask R-CNN in detecting tiny human figures among other prominent human images, and illustrate YOLO was successful in detecting most of the human figures in an image with higher accuracy. Therefore, the paper evaluates and differentiates the performance of YOLO from the deep learning method Mask R-CNN in two points, (1) detection ability and (2) computation time. Since, the machine learning algorithms are mostly data specific, the authors believe that the presented results might vary with the varying nature of the data under observation. In another way, the presented data might be seen as a counter example of unveiling the detection inaccuracy of the Mask R-CNN.
Human detection is a special application of object recognition and is considered one of the greatest challenges in computer vision. It is the starting point of a number of applications, including public safety and security surveillance around the world. Human detection technologies have advanced significantly in recent years due to the rapid development of deep learning techniques. Despite recent advances, we still need to adopt the best network-design practices that enable compact sizes, deep designs, and fast training times while maintaining high accuracies. In this article, we propose ReSTiNet, a novel compressed convolutional neural network that addresses the issues of size, detection speed, and accuracy. Following SqueezeNet, ReSTiNet adopts the fire modules by examining the number of fire modules and their placement within the model to reduce the number of parameters and thus the model size. The residual connections within the fire modules in ReSTiNet are interpolated and finely constructed to improve feature propagation and ensure the largest possible information flow in the model, with the goal of further improving the proposed ReSTiNet in terms of detection speed and accuracy. The proposed algorithm downsizes the previously popular Tiny-YOLO model and improves the following features: (1) faster detection speed; (2) compact model size; (3) solving the overfitting problems; and (4) superior performance than other lightweight models such as MobileNet and SqueezeNet in terms of mAP. The proposed model was trained and tested using MS COCO and Pascal VOC datasets. The resulting ReSTiNet model is 10.7 MB in size (almost five times smaller than Tiny-YOLO), but it achieves an mAP of 63.74% on PASCAL VOC and 27.3% on MS COCO datasets using Tesla k80 GPU.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.