CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review

Mumuni, Alhassan; Mumuni, Fuseini

doi:10.1007/s42979-021-00735-0

Cited by 43 publications

(13 citation statements)

References 142 publications

(205 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, the SPP technique can generate an output with a fixed length without taking the input's size into account. Moreover, the SPP's testing and training phases allow for adaptation to the input image scales, which strengthens the scale-invariance property and eliminates the overfitting problem in the network [50]. However, instead of going into various pooling functions or incorporating learning, spatial pyramid pooling is primarily designed to deal with images of variable sizes and can result in a more complicated learning procedure, resulting in less efficient output, such as a 16:89 percentage error rate on unaugment CIFAR10 [51].…”

Section: Spatial Pyramid Pooling Methodsmentioning

confidence: 99%

A Comparison of Pooling Methods for Convolutional Neural Networks

et al. 2022

View full text Add to dashboard Cite

One of the most promising techniques used in various sciences is deep neural networks (DNNs). A special type of DNN called a convolutional neural network (CNN) consists of several convolutional layers, each preceded by an activation function and a pooling layer. The feature map of the previous layer is sampled by the pooling layer (that seems to be an important layer) to create a new feature map with condensed resolution. This layer significantly reduces the spatial dimension of the input. It always accomplished two main goals. As a first step, it reduces the number of parameters or weights to minimize computational costs. The second step is to prevent the overfitting of the network. In addition, pooling techniques can significantly reduce model training time and computational costs. This paper provides a critical understanding of traditional and modern pooling techniques and highlights the strengths and weaknesses for readers. Moreover, the performance of pooling techniques on different datasets is qualitatively evaluated and reviewed. This study is expected to contribute to a comprehensive understanding of the importance of CNNs and pooling techniques in computer vision challenges.

show abstract

Section: Spatial Pyramid Pooling Methodsmentioning

confidence: 99%

A Comparison of Pooling Methods for Convolutional Neural Networks

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Therefore, if there is simply smoke present during the early fire stage, our model waits until it notices a fire. To improve our model and address the aforementioned problem, we are using large datasets, such as JFT-300M [ 68 , 69 , 70 , 71 , 72 ], which comprises 300 million labeled images.…”

Section: Limitationsmentioning

confidence: 99%

A YOLOv6-Based Improved Fire Detection Approach for Smart City Environments

Saydirasulovich

Abdusalomov

Jamil

et al. 2023

Sensors

View full text Add to dashboard Cite

Authorities and policymakers in Korea have recently prioritized improving fire prevention and emergency response. Governments seek to enhance community safety for residents by constructing automated fire detection and identification systems. This study examined the efficacy of YOLOv6, a system for object identification running on an NVIDIA GPU platform, to identify fire-related items. Using metrics such as object identification speed, accuracy research, and time-sensitive real-world applications, we analyzed the influence of YOLOv6 on fire detection and identification efforts in Korea. We conducted trials using a fire dataset comprising 4000 photos collected through Google, YouTube, and other resources to evaluate the viability of YOLOv6 in fire recognition and detection tasks. According to the findings, YOLOv6’s object identification performance was 0.98, with a typical recall of 0.96 and a precision of 0.83. The system achieved an MAE of 0.302%. These findings suggest that YOLOv6 is an effective technique for detecting and identifying fire-related items in photos in Korea. Multi-class object recognition using random forests, k-nearest neighbors, support vector, logistic regression, naive Bayes, and XGBoost was performed on the SFSC data to evaluate the system’s capacity to identify fire-related objects. The results demonstrate that for fire-related objects, XGBoost achieved the highest object identification accuracy, with values of 0.717 and 0.767. This was followed by random forest, with values of 0.468 and 0.510. Finally, we tested YOLOv6 in a simulated fire evacuation scenario to gauge its practicality in emergencies. The results show that YOLOv6 can accurately identify fire-related items in real time within a response time of 0.66 s. Therefore, YOLOv6 is a viable option for fire detection and recognition in Korea. The XGBoost classifier provides the highest accuracy when attempting to identify objects, achieving remarkable results. Furthermore, the system accurately identifies fire-related objects while they are being detected in real-time. This makes YOLOv6 an effective tool to use in fire detection and identification initiatives.

show abstract

“…Figure 13 shows the typical framework of CNN and RNN. CNN can capture spatial features from the image, which help us accurately identify the object and its relationship with other objects in the image [150]. The characteristic of RNN is that it can process an image or numerical data.…”

Section: Neural Network With Vslammentioning

confidence: 99%

An Overview on Visual SLAM: From Tradition to Semantic

et al. 2022

View full text Add to dashboard Cite

Visual SLAM (VSLAM) has been developing rapidly due to its advantages of low-cost sensors, the easy fusion of other sensors, and richer environmental information. Traditional visionbased SLAM research has made many achievements, but it may fail to achieve wished results in challenging environments. Deep learning has promoted the development of computer vision, and the combination of deep learning and SLAM has attracted more and more attention. Semantic information, as high-level environmental information, can enable robots to better understand the surrounding environment. This paper introduces the development of VSLAM technology from two aspects: traditional VSLAM and semantic VSLAM combined with deep learning. For traditional VSLAM, we summarize the advantages and disadvantages of indirect and direct methods in detail and give some classical VSLAM open-source algorithms. In addition, we focus on the development of semantic VSLAM based on deep learning. Starting with typical neural networks CNN and RNN, we summarize the improvement of neural networks for the VSLAM system in detail. Later, we focus on the help of target detection and semantic segmentation for VSLAM semantic information introduction. We believe that the development of the future intelligent era cannot be without the help of semantic technology. Introducing deep learning into the VSLAM system to provide semantic information can help robots better perceive the surrounding environment and provide people with higher-level help.

show abstract

CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review

Cited by 43 publications

References 142 publications

A Comparison of Pooling Methods for Convolutional Neural Networks

A Comparison of Pooling Methods for Convolutional Neural Networks

A YOLOv6-Based Improved Fire Detection Approach for Smart City Environments

An Overview on Visual SLAM: From Tradition to Semantic

Contact Info

Product

Resources

About