2022
DOI: 10.3390/s22010337
|View full text |Cite
|
Sign up to set email alerts
|

DTS-Net: Depth-to-Space Networks for Fast and Accurate Semantic Object Segmentation

Abstract: We propose Depth-to-Space Net (DTS-Net), an effective technique for semantic segmentation using the efficient sub-pixel convolutional neural network. This technique is inspired by depth-to-space (DTS) image reconstruction, which was originally used for image and video super-resolution tasks, combined with a mask enhancement filtration technique based on multi-label classification, namely, Nearest Label Filtration. In the proposed technique, we employ depth-wise separable convolution-based architectures. We pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

3
4

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 44 publications
0
8
0
Order By: Relevance
“…DW-Conv is much faster than the standard convolution as it learns fewer parameters so it is key to the fast processing in our proposed method. Xception also proved to be a good feature extractor in recent research for multiple computer vision tasks, as it proved to be light enough for real-time applications because of the relatively low FLOPs count and a number of other parameters [ 29 , 30 ]; it also proved to be compatible with the pixel-shuffle [ 11 ] operation (also employed in our proposed method and is introduced in Section 3.2 ) as Xception with the pixel-shuffle showed high accuracy in performing the semantic segmentation task in DTS-Net [ 25 ]. As our method performs the semantic segmentation as a secondary task to predict the encoded line, we adopted a modified version of Xception for its robustness and high accuracy.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…DW-Conv is much faster than the standard convolution as it learns fewer parameters so it is key to the fast processing in our proposed method. Xception also proved to be a good feature extractor in recent research for multiple computer vision tasks, as it proved to be light enough for real-time applications because of the relatively low FLOPs count and a number of other parameters [ 29 , 30 ]; it also proved to be compatible with the pixel-shuffle [ 11 ] operation (also employed in our proposed method and is introduced in Section 3.2 ) as Xception with the pixel-shuffle showed high accuracy in performing the semantic segmentation task in DTS-Net [ 25 ]. As our method performs the semantic segmentation as a secondary task to predict the encoded line, we adopted a modified version of Xception for its robustness and high accuracy.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…This algorithm can up-scale many low-resolution images of shape (where is the scaling factor) into a high-resolution image of shape ( ) through pixel shuffling from the depth channel. This algorithm is fast and efficient in the construction of higher resolution images and especially segmentation masks as explored in detail in our previous research [ 25 , 26 ]. The progressive probabilistic Hough transform (PPHT) [ 12 ] is a popular method for straight line detection from a small set of edge points instead of all edge points used in the standard Hough transform (SHT) [ 27 ], thus PPHT is much faster than HT.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Lee et al [ 13 ] proposed a CNN-based method namely From big to small (BTS) which utilizes local planar guidance layers at different scales in the decoder stage that guides the feature maps to accurate depth predictions. We also provided challenging depth estimation results in previous research [ 14 , 15 ] in which we eliminate the complexity of the decoder in the encoder-decoder CNN architecture using depth-to-space (pixel-shuffle) image reconstruction. Although the previously stated methods attained relatively good results, the estimated depth in most of the stated methods has blurry results especially at the borders of the objects in the scene due to the inefficient encoding and decoding stages due to the local learning scheme naturally provided by the convolution algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…Depth estimation is a critical task in a variety of computer vision applications, including 3D scene reconstruction from 2D images, medical 3D imaging, augmented reality, self-driving cars and robots, and 3D computer graphics and animations. The recent advances in depth estimation research have shown the effectiveness of the convolutional neural networks (CNNs) in performing such a task [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ]. The encoder-decoder CNN architectures are the most used architectures in the dense prediction tasks [ 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 ] (image-like predictions such as semantic segmentation and depth estimation).…”
Section: Introductionmentioning
confidence: 99%