Fully Combined Convolutional Network with Soft Cost Function for Traffic Scene Parsing

Wu, Yan; Yang, Tao; Zhao, Junqiao; Guan, Linting; Li, Jiqian

doi:10.1007/978-3-319-63309-1_64

Cited by 13 publications

(11 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Guided by dilated convolution [ 21 , 35 ] and the VH-stage, we designed the DVH block to improve the semantic segmentation network for linear feature extraction. HFCN [ 36 ] comes up with a further structured layer based on FCCN [ 37 ], and each unpooling layer follows a combination layer. This method can fuse upsampling features of different receptive fields in high-fusion layers.…”

Section: Related Workmentioning

confidence: 99%

UnetDVH-Linear: Linear Feature Segmentation by Dilated Convolution with Vertical and Horizontal Kernels

Liao

Cao

et al. 2020

Sensors

View full text Add to dashboard Cite

Linear feature extraction is crucial for special objects in semantic segmentation networks, such as slot marking and lanes. The objects with linear characteristics have global contextual information dependency. It is very difficult to capture the complete information of these objects in semantic segmentation tasks. To improve the linear feature extraction ability of the semantic segmentation network, we propose introducing the dilated convolution with vertical and horizontal kernels (DVH) into the task of feature extraction in semantic segmentation networks. Meanwhile, we figure out the outcome if we put the different vertical and horizontal kernels on different places in the semantic segmentation networks. Our networks are trained on the basis of the SS dataset, the TuSimple lane dataset and the Massachusetts Roads dataset. These datasets consist of slot marking, lanes, and road images. The research results show that our method improves the accuracy of the slot marking segmentation of the SS dataset by 2%. Compared with other state-of-the-art methods, our UnetDVH-Linear (v1) obtains better accuracy on the TuSimple Benchmark Lane Detection Challenge with a value of 97.53%. To prove the generalization of our models, road segmentation experiments were performed on aerial images. Without data argumentation, the segmentation accuracy of our model on the Massachusetts roads dataset is 95.3%. Moreover, our models perform better than other models when training with the same loss function and experimental settings. The experiment result shows that the dilated convolution with vertical and horizontal kernels will enhance the neural network on linear feature extraction.

show abstract

Section: Related Workmentioning

confidence: 99%

UnetDVH-Linear: Linear Feature Segmentation by Dilated Convolution with Vertical and Horizontal Kernels

Liao

Cao

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…Ronneberger et al added 2x2 up-convolution layer, with a concatenation with corresponding pooling layer in U-Net [25]. FCCN [16] could also be regarded as an alternative decoder structure.…”

Section: Related Workmentioning

confidence: 99%

“…We have achieved considerable improvements by transforming FCN into fully combined network, FCCN, in [16]. FCCN adopted a structure of five unpooling layers, each unpooling layer upsampled the feature maps to a doubled resolution.…”

Section: Highly Fused Convolutional Networkmentioning

confidence: 99%

“…In order to reduce the noise in output maps, FCN introduces skip connections between pooling layers and unpooling layers. Since the proposal of FCN, modern works on segmentation are mostly based on it [14,15].In our previous work, a fully combined convolutional network (FCCN) is explored to improve the segmentation performance [16]. We adopt a layer-by-layer upsampling method.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Semantic segmentation via highly fused convolutional network with multiple soft cost functions

Yang

Zhao

et al. 2019

Cognitive Systems Research

Self Cite

View full text Add to dashboard Cite

Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then we bring out multiple pre-outputs, each pre-output is generated from an unpooling layer by one-step upsampling. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes highly use of the feature information by fusing and reusing feature maps. In addition, when training our model, we add multiple soft cost functions on pre-outputs and final outputs. In this way, we can reduce the loss reduction when the loss is back propagated. We evaluate our model on three major segmentation datasets: CamVid, PASCAL VOC and ADE20K. We achieve a state-of-the-art performance on CamVid dataset, as well as considerable improvements on PASCAL VOC dataset and ADE20K dataset.Key Words: semantic segmentation, multiple soft cost functions, highly fused convolutional network series of CNN-based networks and some useful independent modules have been brought forward too, such as dropout [1] and batch normalization [2]. Convolutional networks are now leading many computer vision tasks, including image classification [3,4], object detection [5,6,7,8] and semantic image segmentation [9,10,11]. Image semantic segmentation is also known as scene parsing, which aims to classify every pixel in the image. It is one of the most challenged and primary tasks in computer vision. Network models for scene parsing task are always based on reliable models for image classification, since segmentation datasets have fewer images than the large available classification datasets. The landmark fully convolutional network (FCN) [9] for semantic segmentation is based on VGG-net [12], which is trained on the famous ImageNet dataset [13]. A novel end-to-end segmentation learning method is introduced in FCNs. In detail, convolution layers with a kernel size of 1x1 take the place of fully connected layers, followed by unpooling layers to recover the spatial resolution of the feature maps. As a consequence, output maps can achieve the same resolution as the input image of the models. In order to reduce the noise in output maps, FCN introduces skip connections between pooling layers and unpooling layers. Since the proposal of FCN, modern works on segmentation are mostly based on it [14,15].In our previous work, a fully combined convolutional network (FCCN) is explored to improve the segmentation performance [16]. We adopt a layer-by-layer upsampling method. After each upsampling operation, we acquire an output with the double size of the input feature maps. We also combine the corresponding pooling and unpooling layers. Another important work in FCCN is the soft cost function used for training the model. Evaluated on CamVid dataset...

show abstract

“…In prior work, based on FCN, we put forward efficient segmentation networks FCCN [22] and HFCN [23]. We proposed the cost function method to train our network.…”

Section: Related Workmentioning

confidence: 99%

VH-HFCN based Parking Slot and Lane Markings Segmentation on Panoramic Surround View

Yang

Zhao

et al. 2018

2018 IEEE Intelligent Vehicles Symposium (IV)

Self Cite

View full text Add to dashboard Cite

The automatic parking is being massively developed by car manufacturers and providers. Until now, there are two problems with the automatic parking. First, there is no openly-available segmentation labels of parking slot on panoramic surround view (PSV) dataset. Second, how to detect parking slot and road structure robustly. Therefore, in this paper, we build up a public PSV dataset. At the same time, we proposed a highly fused convolutional network (HFCN) based segmentation method for parking slot and lane markings based on the PSV dataset. A surround-view image is made of four calibrated images captured from four fisheye cameras. We collect and label more than 4,200 surround view images for this task, which contain various illuminated scenes of different types of parking slots. A VH-HFCN network is proposed, which adopts an HFCN as the base, with an extra efficient VH-stage for better segmenting various markings. The VH-stage consists of two independent linear convolution paths with vertical and horizontal convolution kernels respectively. This modification enables the network to robustly and precisely extract linear features. We evaluated our model on the PSV dataset and the results showed outstanding performance in ground markings segmentation. Based on the segmented markings, parking slots and lanes are acquired by skeletonization, hough line transform and line arrangement.

show abstract

Fully Combined Convolutional Network with Soft Cost Function for Traffic Scene Parsing

Cited by 13 publications

References 13 publications

UnetDVH-Linear: Linear Feature Segmentation by Dilated Convolution with Vertical and Horizontal Kernels

UnetDVH-Linear: Linear Feature Segmentation by Dilated Convolution with Vertical and Horizontal Kernels

Semantic segmentation via highly fused convolutional network with multiple soft cost functions

VH-HFCN based Parking Slot and Lane Markings Segmentation on Panoramic Surround View

Contact Info

Product

Resources

About