The automatic parking is being massively developed by car manufacturers and providers. Until now, there are two problems with the automatic parking. First, there is no openly-available segmentation labels of parking slot on panoramic surround view (PSV) dataset. Second, how to detect parking slot and road structure robustly. Therefore, in this paper, we build up a public PSV dataset. At the same time, we proposed a highly fused convolutional network (HFCN) based segmentation method for parking slot and lane markings based on the PSV dataset. A surround-view image is made of four calibrated images captured from four fisheye cameras. We collect and label more than 4,200 surround view images for this task, which contain various illuminated scenes of different types of parking slots. A VH-HFCN network is proposed, which adopts an HFCN as the base, with an extra efficient VH-stage for better segmenting various markings. The VH-stage consists of two independent linear convolution paths with vertical and horizontal convolution kernels respectively. This modification enables the network to robustly and precisely extract linear features. We evaluated our model on the PSV dataset and the results showed outstanding performance in ground markings segmentation. Based on the segmented markings, parking slots and lanes are acquired by skeletonization, hough line transform and line arrangement.
Recent deep convolutional neural network-based object detectors have shown promising performance when detecting large objects, but they are still limited in detecting small or partially occluded ones-in part because such objects convey limited information due to the small areas they occupy in images. Consequently, it is difficult for deep neural networks to extract sufficient distinguishing fine-grained features for high-level feature maps, which are crucial for the network to precisely locate small or partially occluded objects. There are two ways to alleviate this problem: the first is to use lower-level but larger feature maps to improve location accuracy and the second is to use context information to increase classification accuracy. In this paper, we combine both methods by first constructing larger and more meaningful feature maps in top-down order and concatenating them and subsequently fusing multilevel contextual information through pyramid pooling to construct context aware features. We propose a unified framework called the Semantic Context Aware Network (SCAN) to enhance object detection accuracy. SCAN is simple to implement and can be trained from end to end. We evaluate the proposed network on the KITTI challenge benchmark and present an improvement of the precision.
Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then we bring out multiple pre-outputs, each pre-output is generated from an unpooling layer by one-step upsampling. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes highly use of the feature information by fusing and reusing feature maps. In addition, when training our model, we add multiple soft cost functions on pre-outputs and final outputs. In this way, we can reduce the loss reduction when the loss is back propagated. We evaluate our model on three major segmentation datasets: CamVid, PASCAL VOC and ADE20K. We achieve a state-of-the-art performance on CamVid dataset, as well as considerable improvements on PASCAL VOC dataset and ADE20K dataset.Key Words: semantic segmentation, multiple soft cost functions, highly fused convolutional network series of CNN-based networks and some useful independent modules have been brought forward too, such as dropout [1] and batch normalization [2]. Convolutional networks are now leading many computer vision tasks, including image classification [3,4], object detection [5,6,7,8] and semantic image segmentation [9,10,11]. Image semantic segmentation is also known as scene parsing, which aims to classify every pixel in the image. It is one of the most challenged and primary tasks in computer vision. Network models for scene parsing task are always based on reliable models for image classification, since segmentation datasets have fewer images than the large available classification datasets. The landmark fully convolutional network (FCN) [9] for semantic segmentation is based on VGG-net [12], which is trained on the famous ImageNet dataset [13]. A novel end-to-end segmentation learning method is introduced in FCNs. In detail, convolution layers with a kernel size of 1x1 take the place of fully connected layers, followed by unpooling layers to recover the spatial resolution of the feature maps. As a consequence, output maps can achieve the same resolution as the input image of the models. In order to reduce the noise in output maps, FCN introduces skip connections between pooling layers and unpooling layers. Since the proposal of FCN, modern works on segmentation are mostly based on it [14,15].In our previous work, a fully combined convolutional network (FCCN) is explored to improve the segmentation performance [16]. We adopt a layer-by-layer upsampling method. After each upsampling operation, we acquire an output with the double size of the input feature maps. We also combine the corresponding pooling and unpooling layers. Another important work in FCCN is the soft cost function used for training the model. Evaluated on CamVid dataset...
For the self-driving and automatic parking, perception is the basic and critical technique, moreover, the detection of lane markings and parking slots is an important part of visual perception. In this paper, we use the semantic segmentation method to segment the area and classify the class of lane makings and parking slots on panoramic surround view (PSV) dataset. We propose the DFNet and make two main contributions, one is dynamic loss weights, and the other is residual fusion block (RFB). Dynamic loss weights are varying from classes, calculated according to the pixel number of each class in a batch. RFB is composed of two convolutional layers, one pooling layer, and a fusion layer to combine the feature maps by pixel multiplication. We evaluate our method on PSV dataset, and achieve an advanced result.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.