Exploiting Concepts of Instance Segmentation to Boost Detection in Challenging Environments

Hashmi, Khurram Azeem; Pagani, Alain; Liwicki, Marcus; Stricker, Didier; Afzal, Muhammad Zeshan

doi:10.3390/s22103703

Cited by 4 publications

(3 citation statements)

References 77 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Few other works deal with applying feature attention regional feature information and feature maps for dealing with the night images or low-level feature image. [14] Segmentation is the challenging task in the night vision images for that instance segmentation is used for boosting the object detection the challenging weather condition. [15] For working with night surveillance camera yolov4, yolov5, single shot detector (SSD), retina net are used as comparative study and their results are compared.…”

Section: Related Workmentioning

confidence: 99%

GHE-Ensemble: Enhanced Hybrid Image Enhancement Model for Night Vision Multi-Object detection in Autonomous Vehicle

Ranjitha P

2024

jes

View full text Add to dashboard Cite

Autonomous vehicle is the exponential growing topic in the research filed due its huge demand in the current industry. Where object detection plays the major role to take any major decision. Object detection in the normal vision yields the excellent performance due to deep learning models. But object detection in the night vision does not yield the good performance due to various challenges in it. In the proposed work created a night vision database called BDD-Darko where it contains night vision images with multi objects. Applied novel GHE-Ensemble model for image enhancement technique and trained using deep learning model Yolov5 for the multi-object detection in the night vision. Which resulted in 65.3 % accuracy. Proposed model is yielding better results for image enhancement technique for night vision than the previous existing model and detecting the multi object in the single frame with better performance. This paper proposes (1)A new database called as BDD-Darko which contains night vision image with multiple objects. (2) A Novel GHE-Ensemble model for image enhancement technique for night vision images

show abstract

Section: Related Workmentioning

confidence: 99%

GHE-Ensemble: Enhanced Hybrid Image Enhancement Model for Night Vision Multi-Object detection in Autonomous Vehicle

Ranjitha P

2024

jes

View full text Add to dashboard Cite

show abstract

“…𝑄𝐾 𝑇 √𝐷𝑘 ……… (9) Where: Dk: Dimension of keys Then, we apply softmax to obtain attention weights A = Softmax(S) Finally, using equation (10) we compute the attentionweighted values and output of the self-attention mechanism Zatt = AV ……… (10)…”

Section: S =mentioning

confidence: 99%

A Hybrid Model Combining Depthwise Separable Convolutions and Vision Transformers for Traffic Sign Classification Under Challenging Weather Conditions.

Parse,

Pramod,

Kumar

2024

Preprint

View full text Add to dashboard Cite

This research presents a novel deep-learning framework designed for traffic sign image classification under adverse conditions, including rain, shadows, haze, codec errors, and dirty lenses. To effectively balance accuracy and training parameters, the approach combines depthwise and pointwise convolutions, often referred to as depthwise separable convolutions, with a Vision Transformer (ViT) for subsequent feature extraction. The framework's initial block comprises two pairs of depthwise and pointwise convolutional layers followed by a normalization layer. Depthwise convolution is responsible for processing each input channel independently and applying separate filters to each channel, thereby reducing computational cost and parameters while maintaining spatial structure. Pointwise convolutional layers combine information from different channels, fostering complex feature interactions and non-linearities. Batch normalization is used for training stability. At the end of the initial block, the max pooling layer is used to enhance and downsample spatial dimensions. The architecture repeats four times, preserving crucial information through skip connections. To extract global context information, inter-block skip connections and global average pooling (GAP) are employed for dimensionality reduction while retaining vital information. Integration of the ViT model in the final layers captures far-reaching dependencies and relations in the feature maps. The framework concludes with two fully connected layers, a bottleneck layer with 1024 neurons and a second layer using softmax activation to generate a probability distribution over 14 classes. The proposed framework, combining convolution blocks and skip connections with precisely tuned ViT hyperparameters, enhances model performance and achieves an exceptional validation accuracy of 99.3%.

show abstract

“…The simple application of still-image object detectors is sub-optimal in challenging environments [ 12 , 13 ]. Furthermore, applying image object detection algorithms to video data would process it as a sequence of unrelated individual images, and this approach would result in losing the temporal information present across the frames.…”

Section: Introductionmentioning

confidence: 99%

Attention-Guided Disentangled Feature Aggregation for Video Object Detection

Muralidhara

Hashmi

Pagani

et al. 2022

Sensors

View full text Add to dashboard Cite

Object detection is a computer vision task that involves localisation and classification of objects in an image. Video data implicitly introduces several challenges, such as blur, occlusion and defocus, making video object detection more challenging in comparison to still image object detection, which is performed on individual and independent images. This paper tackles these challenges by proposing an attention-heavy framework for video object detection that aggregates the disentangled features extracted from individual frames. The proposed framework is a two-stage object detector based on the Faster R-CNN architecture. The disentanglement head integrates scale, spatial and task-aware attention and applies it to the features extracted by the backbone network across all the frames. Subsequently, the aggregation head incorporates temporal attention and improves detection in the target frame by aggregating the features of the support frames. These include the features extracted from the disentanglement network along with the temporal features. We evaluate the proposed framework using the ImageNet VID dataset and achieve a mean Average Precision (mAP) of 49.8 and 52.5 using the backbones of ResNet-50 and ResNet-101, respectively. The improvement in performance over the individual baseline methods validates the efficacy of the proposed approach.

show abstract

Exploiting Concepts of Instance Segmentation to Boost Detection in Challenging Environments

Cited by 4 publications

References 77 publications

GHE-Ensemble: Enhanced Hybrid Image Enhancement Model for Night Vision Multi-Object detection in Autonomous Vehicle

GHE-Ensemble: Enhanced Hybrid Image Enhancement Model for Night Vision Multi-Object detection in Autonomous Vehicle

A Hybrid Model Combining Depthwise Separable Convolutions and Vision Transformers for Traffic Sign Classification Under Challenging Weather Conditions.

Attention-Guided Disentangled Feature Aggregation for Video Object Detection

Contact Info

Product

Resources

About