Recurrent Fully Convolutional Networks for Video Segmentation

Valipour, Sepehr; Siam, Mennatullah; Jägersand, Martin; Ray, Nilanjan

doi:10.1109/wacv.2017.11

Cited by 69 publications

(53 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Until now, only very few approaches combining semantic segmentation approaches with recurrent structures have been proposed. One of the first approaches was the Recurrent Fully Convolutional Network (RFCN, [22]), which was introduced by Valipour et al in 2017. They extended the FCN approach [18] by placing a recurrent unit between the encoder and the decoder, and exhibit better performance on the SegTrack, Davis, and Moving MNIST dataset.…”

Section: Related Workmentioning

confidence: 99%

Semantic Segmentation of Video Sequences with Convolutional LSTMs

Pfeuffer

Schulz

Dietmayer

2019

2019 IEEE Intelligent Vehicles Symposium (IV)

View full text Add to dashboard Cite

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.Abstract-Most of the semantic segmentation approaches have been developed for single image segmentation, and hence, video sequences are currently segmented by processing each frame of the video sequence separately. The disadvantage of this is that temporal image information is not considered, which improves the performance of the segmentation approach. One possibility to include temporal information is to use recurrent neural networks. However, there are only a few approaches using recurrent networks for video segmentation so far. These approaches extend the encoder-decoder network architecture of well-known segmentation approaches and place convolutional LSTM layers between encoder and decoder. However, in this paper it is shown that this position is not optimal, and that other positions in the network exhibit better performance. Nowadays, state-of-the-art segmentation approaches rarely use the classical encoder-decoder structure, but use multi-branch architectures. These architectures are more complex, and hence, it is more difficult to place the recurrent units at a proper position. In this work, the multi-branch architectures are extended by convolutional LSTM layers at different positions and evaluated on two different datasets in order to find the best one. It turned out that the proposed approach outperforms the pure CNNbased approach for up to 1.6 percent.

show abstract

Section: Related Workmentioning

confidence: 99%

Semantic Segmentation of Video Sequences with Convolutional LSTMs

Pfeuffer

Schulz

Dietmayer

2019

2019 IEEE Intelligent Vehicles Symposium (IV)

View full text Add to dashboard Cite

show abstract

“…In the literature, there are only a few approaches using recurrent neural networks (RNNs) for video segmentation, since the video sequences are usually split into its individual frames, which are processed independently of each other by state-of-the-art segmentation approaches such as ICNet [24], PSPNet [25] or Deeplab [2]. However, temporal image information is not considered in these works, but which can improve the segmentation accuracy further, as Valipour et al show in [22]. The authors showed on several datasets that the performance of the Fully Convolutional Network (FCN) [16] can be increased if a recurrent unit is placed between the encoder and the decoder.…”

Section: Related Workmentioning

confidence: 99%

Separable Convolutional LSTMs for Faster Video Segmentation

Pfeuffer

Dietmayer

2019

2019 IEEE Intelligent Transportation Systems Conference (ITSC)

View full text Add to dashboard Cite

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.Abstract-Semantic Segmentation is an important module for autonomous robots such as self-driving cars. The advantage of video segmentation approaches compared to single image segmentation is that temporal image information is considered, and their performance increases due to this. Hence, single image segmentation approaches are extended by recurrent units such as convolutional LSTM (convLSTM) cells, which are placed at suitable positions in the basic network architecture. However, a major critique of video segmentation approaches based on recurrent neural networks is their large parameter count and their computational complexity, and so, their inference time of one video frame takes up to 66 percent longer than their basic version. Inspired by the success of the spatial and depthwise separable convolutional neural networks, we generalize these techniques for convLSTMs in this work, so that the number of parameters and the required FLOPs are reduced significantly. Experiments on different datasets show that the segmentation approaches using the proposed, modified convLSTM cells achieve similar or slightly worse accuracy, but are up to 15 percent faster on a GPU than the ones using the standard convLSTM cells. Furthermore, a new evaluation metric is introduced, which measures the amount of flickering pixels in the segmented video sequence.

show abstract

“…Deep learning has achieved state-of-the-art segmentation performance in 2D natural images [3] and 3D medical images [7,9,8]. To leverage the temporal dependency and account for segmentation continuity, recurrent neural networks (RNNs) have been adopted for videos [12,16] and 2D+T cardiac MRI datasets [18]. Convolutional models can also represent temporal relationships and offer competitive performance for language translation [5].…”

Section: Related Workmentioning

confidence: 99%

4D CNN for Semantic Segmentation of Cardiac Volumetric Sequences

Myronenko

Yang

Buch

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We propose a 4D convolutional neural network (CNN) for the segmentation of retrospective ECG-gated cardiac CT, a series of single-channel volumetric data over time. While only a small subset of volumes in the temporal sequence are annotated, we define a sparse loss function on available labels to allow the network to leverage unlabeled images during training and generate a fully segmented sequence. We investigate the accuracy of the proposed 4D network to predict temporally consistent segmentations and compare with traditional 3D segmentation approaches. We demonstrate the feasibility of the 4D CNN and establish its performance on cardiac 4D CCTA.

show abstract

Recurrent Fully Convolutional Networks for Video Segmentation

Cited by 69 publications

References 30 publications

Semantic Segmentation of Video Sequences with Convolutional LSTMs

Semantic Segmentation of Video Sequences with Convolutional LSTMs

Separable Convolutional LSTMs for Faster Video Segmentation

4D CNN for Semantic Segmentation of Cardiac Volumetric Sequences

Contact Info

Product

Resources

About