RGB-D Images for Object Segmentation, Localization and Recognition in Indoor Scenes using Feature Descriptor and Hough Voting

Ahmed, Abrar; Jalal, Ahmad; Kim, Ki-Bum

doi:10.1109/ibcast47879.2020.9044545

Cited by 48 publications

(20 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For robust identification of HIR, actual human interaction areas need to be extracted and to distinguish target images from clutter [ 64 ]. To extract efficient silhouette representation, we depend mainly on connected components, skin tone, region growing and color spacing [ 65 ]. Various algorithms are used for both RGB and RGB-D silhouette segmentation to improve the performance of the proposed system.…”

Section: Proposed System Methodologymentioning

confidence: 99%

Automatic Recognition of Human Interaction via Hybrid Descriptors and Maximum Entropy Markov Model Using Depth Sensors

Jalal

Khalid

Kim

2020

Entropy

Self Cite

108

View full text Add to dashboard Cite

Automatic identification of human interaction is a challenging task especially in dynamic environments with cluttered backgrounds from video sequences. Advancements in computer vision sensor technologies provide powerful effects in human interaction recognition (HIR) during routine daily life. In this paper, we propose a novel features extraction method which incorporates robust entropy optimization and an efficient Maximum Entropy Markov Model (MEMM) for HIR via multiple vision sensors. The main objectives of proposed methodology are: (1) to propose a hybrid of four novel features—i.e., spatio-temporal features, energy-based features, shape based angular and geometric features—and a motion-orthogonal histogram of oriented gradient (MO-HOG); (2) to encode hybrid feature descriptors using a codebook, a Gaussian mixture model (GMM) and fisher encoding; (3) to optimize the encoded feature using a cross entropy optimization function; (4) to apply a MEMM classification algorithm to examine empirical expectations and highest entropy, which measure pattern variances to achieve outperformed HIR accuracy results. Our system is tested over three well-known datasets: SBU Kinect interaction; UoL 3D social activity; UT-interaction datasets. Through wide experimentations, the proposed features extraction algorithm, along with cross entropy optimization, has achieved the average accuracy rate of 91.25% with SBU, 90.4% with UoL and 87.4% with UT-Interaction datasets. The proposed HIR system will be applicable to a wide variety of man–machine interfaces, such as public-place surveillance, future medical applications, virtual reality, fitness exercises and 3D interactive gaming.

show abstract

Section: Proposed System Methodologymentioning

confidence: 99%

Automatic Recognition of Human Interaction via Hybrid Descriptors and Maximum Entropy Markov Model Using Depth Sensors

Jalal

Khalid

Kim

2020

Entropy

Self Cite

108

View full text Add to dashboard Cite

show abstract

“…After that, in order to remove noise from the image, a median filter is applied in which pixels are replaced by a median of neighboring pixels [66,67]. The most important step in any HAR system is to define and mine Regions of Interest (ROI) [68]. In our work, an ROI consists of two persons involved in an interaction in RGB-D images.…”

Section: Foreground Extractionmentioning

confidence: 99%

Modeling Two-Person Segmentation and Locomotion for Stereoscopic Action Identification: A Sustainable Video Surveillance System

et al. 2021

Self Cite

View full text Add to dashboard Cite

Due to the constantly increasing demand for automatic tracking and recognition systems, there is a need for more proficient, intelligent and sustainable human activity tracking. The main purpose of this study is to develop an accurate and sustainable human action tracking system that is capable of error-free identification of human movements irrespective of the environment in which those actions are performed. Therefore, in this paper we propose a stereoscopic Human Action Recognition (HAR) system based on the fusion of RGB (red, green, blue) and depth sensors. These sensors give an extra depth of information which enables the three-dimensional (3D) tracking of each and every movement performed by humans. Human actions are tracked according to four features, namely, (1) geodesic distance; (2) 3D Cartesian-plane features; (3) joints Motion Capture (MOCAP) features and (4) way-points trajectory generation. In order to represent these features in an optimized form, Particle Swarm Optimization (PSO) is applied. After optimization, a neuro-fuzzy classifier is used for classification and recognition. Extensive experimentation is performed on three challenging datasets: A Nanyang Technological University (NTU) RGB+D dataset; a UoL (University of Lincoln) 3D social activity dataset and a Collective Activity Dataset (CAD). Evaluation experiments on the proposed system proved that a fusion of vision sensors along with our unique features is an efficient approach towards developing a robust HAR system, having achieved a mean accuracy of 93.5% with the NTU RGB+D dataset, 92.2% with the UoL dataset and 89.6% with the Collective Activity dataset. The developed system can play a significant role in many computer vision-based applications, such as intelligent homes, offices and hospitals, and surveillance systems.

show abstract

“…To date, most research on this field has focused on single modality approaches, which may consist of either RGB [ 12 ] or RGB-D videos [ 13 ], wearables such as inertial sensors (Inertial Measurement Units—IMUs) [ 14 ], or ambient sensors [ 15 ]. The scenarios in which each of these modalities have been employed for activity recognition vary according to the availability of data, which may be constrained by technical or ethical limitations.…”

Section: Introductionmentioning

confidence: 99%

Activity Recognition for Ambient Assisted Living with Videos, Inertial Units and Ambient Sensors

Ranieri

MacLeod

Dragone

et al. 2021

Sensors

View full text Add to dashboard Cite

Worldwide demographic projections point to a progressively older population. This fact has fostered research on Ambient Assisted Living, which includes developments on smart homes and social robots. To endow such environments with truly autonomous behaviours, algorithms must extract semantically meaningful information from whichever sensor data is available. Human activity recognition is one of the most active fields of research within this context. Proposed approaches vary according to the input modality and the environments considered. Different from others, this paper addresses the problem of recognising heterogeneous activities of daily living centred in home environments considering simultaneously data from videos, wearable IMUs and ambient sensors. For this, two contributions are presented. The first is the creation of the Heriot-Watt University/University of Sao Paulo (HWU-USP) activities dataset, which was recorded at the Robotic Assisted Living Testbed at Heriot-Watt University. This dataset differs from other multimodal datasets due to the fact that it consists of daily living activities with either periodical patterns or long-term dependencies, which are captured in a very rich and heterogeneous sensing environment. In particular, this dataset combines data from a humanoid robot’s RGBD (RGB + depth) camera, with inertial sensors from wearable devices, and ambient sensors from a smart home. The second contribution is the proposal of a Deep Learning (DL) framework, which provides multimodal activity recognition based on videos, inertial sensors and ambient sensors from the smart home, on their own or fused to each other. The classification DL framework has also validated on our dataset and on the University of Texas at Dallas Multimodal Human Activities Dataset (UTD-MHAD), a widely used benchmark for activity recognition based on videos and inertial sensors, providing a comparative analysis between the results on the two datasets considered. Results demonstrate that the introduction of data from ambient sensors expressively improved the accuracy results.

show abstract

RGB-D Images for Object Segmentation, Localization and Recognition in Indoor Scenes using Feature Descriptor and Hough Voting

Cited by 48 publications

References 36 publications

Automatic Recognition of Human Interaction via Hybrid Descriptors and Maximum Entropy Markov Model Using Depth Sensors

Automatic Recognition of Human Interaction via Hybrid Descriptors and Maximum Entropy Markov Model Using Depth Sensors

Modeling Two-Person Segmentation and Locomotion for Stereoscopic Action Identification: A Sustainable Video Surveillance System

Activity Recognition for Ambient Assisted Living with Videos, Inertial Units and Ambient Sensors

Contact Info

Product

Resources

About