Object detection in streaming images is a major step in different detection-based applications, such as object tracking, action recognition, robot navigation, and visual surveillance applications. In most cases, image quality is noisy and biased, and as a result, the data distributions are disturbed and imbalanced. Most object detection approaches, such as the faster region-based convolutional neural network (Faster RCNN), Single Shot Multibox Detector with 300x300 inputs (SSD300), and You Only Look Once version 2 (YOLOv2), rely on simple sampling without considering distortions and noise under real-world changing environments, despite poor object labeling. In this paper, we propose an Incremental active semi-supervised learning (IASSL) technology for unseen object detection. It combines batch-based active learning (AL) and bin-based semi-supervised learning (SSL) to leverage the strong points of AL's exploration and SSL's exploitation capabilities. A collaborative sampling method is also adopted to measure the uncertainty and diversity of AL and the confidence in SSL. Batch-based AL allows us to select more informative, confident, and representative samples with low cost. Bin-based SSL divides streaming image samples into several bins, and each bin repeatedly transfers the discriminative knowledge of convolutional neural network (CNN) deep learning to the next bin until the performance criterion is reached. IASSL can overcome noisy and biased labels in unknown, cluttered data distributions. We obtain superior performance, compared to state-of-the-art technologies such as Faster RCNN, SSD300, and YOLOv2. FIGURE 8. Data model for IASSL: the root node consists of PASCAL VOC 2007 and 2012 and the local dataset. The local datasets are captured.
Recent advancements in the field of computer vision with the help of deep neural networks have led us to explore and develop many existing challenges that were once unattended due to the lack of necessary technologies. Hand Sign/Gesture Recognition is one of the significant areas where the deep neural network is making a substantial impact. In the last few years, a large number of researches has been conducted to recognize hand signs and hand gestures, which we aim to extend to our mother-tongue, Bangla (also known as Bengali). The primary goal of our work is to make an automated tool to aid the people who are unable to speak. We developed a system that automatically detects hand sign based digits and speaks out the result in Bangla language. According to the report of the World Health Organization (WHO), 15% of people in the world live with some kind of disabilities. Among them, individuals with communication impairment such as speech disabilities experience substantial barrier in social interaction. The proposed system can be invaluable to mitigate such a barrier. The core of the system is built with a deep learning model which is based on convolutional neural networks (CNN). The model classifies hand sign based digits with 92% accuracy over validation data which ensures it a highly trustworthy system. Upon classification of the digits, the resulting output is fed to the text to speech engine and the translator unit eventually which generates audio output in Bangla language. A web application to demonstrate our tool is available at http://bit.ly/signdigits2banglaspeech.
Action recognition is an essential task in computer vision due to the variety of prospective applications, such as security surveillance, machine learning, and human-computer interaction. The availability of more video data than ever before and the lofty performance of deep convolutional neural networks also make it essential for action recognition in video. Unfortunately, limited crafted video features and the scarcity of benchmark datasets make it challenging to address the multi-person action recognition task in video data. In this work, we propose a deep convolutional neural network-based Effective Hybrid Learning (EHL) framework for two-person interaction classification in video data. Our approach exploits a pre-trained network model (the VGG16 from the University of Oxford Visual Geometry Group) and extends the Faster R-CNN (region-based convolutional neural network a stateof-the-art detector for image classification). We broaden a semi-supervised learning method combined with an active learning method to improve overall performance. Numerous types of two-person interactions exist in the real world, which makes this a challenging task. In our experiment, we consider a limited number of actions, such as hugging, fighting, linking arms, talking, and kidnapping in two environment such simple and complex. We show that our trained model with an active semi-supervised learning architecture gradually improves the performance. In a simple environment using an Intelligent Technology Laboratory (ITLab) dataset from Inha University, performance increased to 95.6% accuracy, and in a complex environment, performance reached 81% accuracy. Our method reduces data-labeling time, compared to supervised learning methods, for the ITLab dataset. We also conduct extensive experiment on Human Action Recognition benchmarks such as UT-Interaction dataset, HMDB51 dataset and obtain better performance than state-of-the-art approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.