Real-Time Semantic Segmentation via an Efficient Multi-Column Network

Peng, Chengli; Ma, Jiayi

doi:10.1007/s11390-022-0888-4

Cited by 4 publications

(8 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, its speed is much higher, reaching 131 FPS, a significant advantage compared to the 16.6 FPS of HyperSeg-L. Regarding segmentation speed, the FPS of the algorithm in this paper is lower than DFANet B, 43 GAS, 44 MFNet, 46 and PP-LiteSeg-T 42 . Nevertheless, the mIoU values of this paper’s algorithm are 19.9%, 6.4%, 7.7%, and 4.2% higher than these algorithms, respectively, indicating that this algorithm maintains a good balance between accuracy and real-time performance while possessing a certain advantage in terms of accuracy.…”

Section: Methodsmentioning

confidence: 76%

“…However, the mIoU value of the proposed algorithm is 10.1%, 2.6%, and 7.6% higher than these models, respectively. In terms of speed, the proposed algorithm achieves 82 FPS, which is similar to STDC2-Seg75 39 and PP-LiteSeg-B2, 42 but the mIoU value of the proposed algorithm is 1.7% and 1.2% higher than these models, respectively. As for the accuracy rate, the proposed algorithm achieves a mIoU value of 78.5%, which is only lower than SFNet(ResNet-18), 21 with a mIoU value of 78.9%.…”

Section: Methodsmentioning

confidence: 79%

“…Specifically, the proposed algorithm is compared to efficient real-time semantic segmentation models in terms of parameter quantity, speed, and accuracy on the Cityscapes dataset. Including BiSeNet, 14 multiply spatial fusion network (MSFNet), 36 dongfeng (DF-Seg), 37 in defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images (SwiftNet), 11 context aggregated bi-lateral network (CABiNet), 38 SFNet, 21 BiSeNetV2, 20 short-term dense con-catenate (STDC-Seg), 39 HyperSeg, 40 DDRNet, 15 fast bilateral symmetrical network (FBSNet), 41 and a superior real-time semantic segmentation model (PP-LiteSeg) 42 . As shown in Table 4, the method proposed in this paper achieves a good balance between these three aspects.…”

Section: Methodsmentioning

confidence: 99%

“…Table 5 shows a comparison between the proposed algorithm and other excellent real-time semantic segmentation methods on the Camvid test set to validate its effectiveness further. Including deep feature aggregation network (DFANet), 43 MSFNet, 36 graph-guided architecture search for real-time semantic segmentation (GAS), 44 SFNet, 21 temporally distributed network (TDNet), 45 BiSeNetV2, 20 HyperSeg, 40 DDRNet, 15 multi-feature fusion network (MFNet), 46 cascaded selective resolution network (CSRNet), 47 and PP-LiteSeg 42 . As shown in Table 5, the mIoU value of this paper’s algorithm is 79.2%, which is second only to the mIoU value of HyperSeg-L 40 (79.7%).…”

Section: Methodsmentioning

confidence: 99%

See 3 more Smart Citations

BFBE-Net: deep bilateral fusion and bilateral embedded network for real-time semantic segmentation

Hou,

Cheng,

Dai

et al. 2023

J. Electron. Imag.

View full text Add to dashboard Cite

Real-time semantic segmentation has been challenging, and the fusion of features from different branches remains crucial to improvement. The two-branch structure has shown promising results in real-time semantic segmentation. However, upsampling feature maps from the semantic branch to match the detail branch leads to a loss of object feature information and compromises segmentation accuracy. We propose a deep bilateral fusion and bilateral embedded network (BFBE-Net) based on the encoder-decoder structure for real-time semantic segmentation to address these issues. The BFBE-Net adopts a two-branch design in the encoder, with a top-down fusion module and a bottom-up fusion module designed to integrate multi-scale context information in the channel dimension, and assigns different weights to detailed information and semantic information to enhance information characteristics. In the decoder, a bilateral embedded attention module under the guidance of spatial and channel attention integrates semantic and spatial features, gradually upsampling feature maps to reduce the loss of feature information. In addition, an enhanced aggregation pyramid pooling module is designed to efficiently extract contextual information by combining depth-wise asymmetric convolution. The proposed algorithm is evaluated on two benchmark datasets, Cityscapes and CamVid, achieving 78.5% mean intersection over union (mIoU) at 82 frames per second (FPS) on the Cityscapes test set and 79.2% mIoU at 131 FPS on the CamVid test set. The proposed BFBE-Net not only improves segmentation accuracy but also ensures real-time performance.

show abstract

Section: Methodsmentioning

confidence: 76%

Section: Methodsmentioning

confidence: 79%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

BFBE-Net: deep bilateral fusion and bilateral embedded network for real-time semantic segmentation

Hou,

Cheng,

Dai

et al. 2023

J. Electron. Imag.

View full text Add to dashboard Cite

show abstract

“…High-precision networks effectively improve their own accuracy through various functional feature modules but slow down the speed of prediction. Instead, PP-LiteSeg 23 designed three lightweight modules to achieve a superior trade-off between accuracy and speed. Therefore, designing simple and effective feature processing methods will greatly help improve the accuracy of real-time segmentation.…”

Section: Related Workmentioning

confidence: 99%

Tripartite real-time semantic segmentation network with scene commonality

Wang,

Liu

et al. 2024

J. Electron. Imag.

View full text Add to dashboard Cite

The two-branch real-time semantic segmentation network can quickly acquire lowlevel details and high-level semantics. However, the large contextual gap between them results in adverse impact on their fusion, and limits the further improvement of real-time segmentation accuracy. This paper proposes a tripartite real-time semantic segmentation network with scene commonality (TriSCNet) to address this problem. First, we add a parallel scene commonality branch based on the current two-branch architecture to learn intrinsic common features in similar street scene images, such as the spatial location distribution of various objects and the internal connections between them at the semantic level. Further, with the guidance of commonality, we propose an external branch attention module to enrich and enhance the feature information of traditional two branches. Finally, we utilize an alignment and selective fusion module to correct the misaligned context in the semantic branch and highlight the essential spatial information in the detailed branch. Our proposed TriSCNet achieves an excellent trade-off between accuracy and speed, yielding 77.9% mIOU at 67.2 FPS on Cityscapes test set and 75.8% mIOU at 127.4 FPS on CamVid test set, respectively.

show abstract

SCMA: Exploring Dual-Module Attention With Multi-Scale Kernels for Effective Feature Extraction

Samad,

Gitanjali

2023

IEEE Access

View full text Add to dashboard Cite

Feature space enrichment is an integral part of the development of attention mechanisms in Convolutional Neural Networks (CNNs). The ability to efficiently extract channel and spatial information across a variety of scales is crucial. Furthermore, balancing model parameter efficiency while ensuring higher accuracy is a key objective. To create a compelling and robust attention mechanism, channel and spatial attention must be carefully incorporated into CNN architecture. This research work addresses these challenges and presents an attention mechanism called Spatial and Channel aware Multi-scale kernel Attention (SCMA) for CNNs. Our approach leverages the combination of two separate attention modules, one for channel-wise attention and another for spatial attention, in sequential order to refine intermediate feature representations in a CNN. The SCMA module is designed to be compact and universal, capable of being seamlessly integrated into any baseline CNN architecture with minimal parameter overhead, and can be trained in an end-to-end manner. Our empirical findings regarding the utilization of SCMA in conjunction with various CNN architectures for image classification tasks on multiple benchmark datasets including Imagenette, Imagewoof, CIFAR-10, CIFAR-100, and CINIC, affirm the intuition that multi-scale kernels are pivotal for effectively capturing dependencies across both spatial and channel dimensions. In many instances, SCMA exhibits higher performance in terms of accuracy than its state-of-the-art counterparts while keeping the parameter overhead to a minimum.

show abstract

Real-Time Semantic Segmentation via an Efficient Multi-Column Network

Cited by 4 publications

References 25 publications

BFBE-Net: deep bilateral fusion and bilateral embedded network for real-time semantic segmentation

BFBE-Net: deep bilateral fusion and bilateral embedded network for real-time semantic segmentation

Tripartite real-time semantic segmentation network with scene commonality

SCMA: Exploring Dual-Module Attention With Multi-Scale Kernels for Effective Feature Extraction

Contact Info

Product

Resources

About