Occlusion aware unsupervised learning of optical flow from video

Li, Jianfeng; Zhao, Junqiao; Song, Shuangfu; Feng, Tiantian

doi:10.1117/12.2588381

Cited by 8 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This approach therefore tries to detect moving pixels by the difference between the optical flow and rigid flow predictions, assuming a worse prediction by the rigid flow. Other approaches such as [40,41,30,27,25,7] propose to infer a moving object mask using a pre-determined metric related to the geometric inconsistency between the optical flow and the rigid flow.…”

Section: Related Workmentioning

confidence: 99%

Rebalancing gradient to improve self-supervised co-training of depth, odometry and optical flow predictions

Hariat

Manzanera

Filliat

2023

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

We present CoopNet, an approach that improves the cooperation of co-trained networks by dynamically adapting the apportionment of gradient, to ensure equitable learning progress. It is applied to motion-aware self-supervised prediction of depth maps, by introducing a new hybrid loss, based on a distribution model of photo-metric reconstruction errors made by, on the one hand the depth + odometry paired networks, and on the other hand the optical flow network. This model essentially assumes that the pixels from moving objects (that must be discarded for training depth and odometry), correspond to those where the two reconstructions strongly disagree. We justify this model by theoretical considerations and experimental evidences. A comparative evaluation on KITTI and CityScapes datasets shows that CoopNet improves or is comparable to the state-of-the-art in depth, odometry and optical flow predictions. Our code is available here: https://github.com/mhariat/CoopNet.

show abstract

Section: Related Workmentioning

confidence: 99%

Rebalancing gradient to improve self-supervised co-training of depth, odometry and optical flow predictions

Hariat

Manzanera

Filliat

2023

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

show abstract

“…Two adjacent frames of images are stacked in the positive and negative order and then input into the optical flow network to obtain the forward and reverse optical flow OAFlow [22] also uses the forward and reverse optical flow to calculate the occlusion. Different from UnFlow and OAFlow, Li et al [23] uses the forward and backward optical flow between three frames to calculate the occlusion, and achieves a higher optical flow estimation accuracy. We also adopted this method.…”

Section: B Unsupervised Learning Of Optical Flowmentioning

confidence: 99%

“…In this process, occlusion (pixels are not visible in another frame) will lead to incorrect interpolation results, which will mislead the optimization of the network during the training phase. We detect the occlusion based on the difference of reconstruction error between the forward and backward direction, as presented in our previous work [23]. The methods of occlusion detection and dynamic object detection will be explained in III-C.…”

Section: ) Geometric and Appearance Fundamentalmentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video

Li,

Zhao,

Song

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Estimating geometric elements such as depth, camera motion, and optical flow from images is an important part of the robot's visual perception. We use a joint selfsupervised method to estimate the three geometric elements. Depth network, optical flow network and camera motion network are independent of each other but are jointly optimized during training phase. Compared with independent training, joint training can make full use of the geometric relationship between geometric elements and provide dynamic and static information of the scene. In this paper, we improve the joint self-supervision method from three aspects: network structure, dynamic object segmentation, and geometric constraints. In terms of network structure, we apply the attention mechanism to the camera motion network, which helps to take advantage of the similarity of camera movement between frames. And according to attention mechanism in Transformer, we propose a plug-and-play convolutional attention module. In terms of dynamic object, according to the different influences of dynamic objects in the optical flow self-supervised framework and the depth-pose self-supervised framework, we propose a threshold algorithm to detect dynamic regions, and mask that in the loss function respectively. In terms of geometric constraints, we use traditional methods to estimate the fundamental matrix from the corresponding points to constrain the camera motion network. We demonstrate the effectiveness of our method on the KITTI dataset. Compared with other joint self-supervised methods, our method achieves state-of-the-art performance in the estimation of pose and optical flow, and the depth estimation has also achieved competitive results. Code will be available at:https://github.com/jianfenglihg/Unsupervised geometry.

show abstract

“…Behavior recognition, as a fundamental task of video analysis technology, has become increasingly demanding in video-based applications such as human-machine interaction, autonomous driving, and intelligent surveillance [3,4]. However, recognizing the motion information of objects is nontrivial due to occlusion, dynamic backgrounds, and moving cameras in video scenarios [5,6]. For example, it is difficult to distinguish between behaviors when faced with dynamic and moving backgrounds.…”

Section: Introductionmentioning

confidence: 99%

Behavior Recognition Based on the Integration of Multigranular Motion Features

Zhang¹,

Wang²,

Hui³

et al. 2022

Preprint

View full text Add to dashboard Cite

The recognition of behaviors in videos usually requires a combinatorial analysis of the spatial information about objects and their dynamic action information in the temporal dimension. Specifically, behavior recognition may even rely more on the modeling of temporal information containing shortrange and long-range motions; this contrasts with computer vision tasks involving images that focus on the understanding of spatial information. However, current solutions fail to jointly and comprehensively analyze short-range motion between adjacent frames and long-range temporal aggregations at large scales in videos. In this paper, we propose a novel behavior recognition method based on the integration of multigranular (IMG) motion features. In particular, we achieve reliable motion information modeling through the synergy of a channel attention-based short-term motion feature enhancement module (CMEM) and a cascaded long-term motion feature integration module (CLIM). We evaluate our model on several action recognition benchmarks such as HMDB51, Something-Something and UCF101. The experimental results demonstrate that our approach outperforms the previous state-ofthe-art methods, which confirms its effectiveness and efficiency.

show abstract

Occlusion aware unsupervised learning of optical flow from video

Cited by 8 publications

References 25 publications

Rebalancing gradient to improve self-supervised co-training of depth, odometry and optical flow predictions

Rebalancing gradient to improve self-supervised co-training of depth, odometry and optical flow predictions

Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video

Behavior Recognition Based on the Integration of Multigranular Motion Features

Contact Info

Product

Resources

About