Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation

Park, Sunghyun; Kim, Kangyeol; Lee, Junsoo; Choo, Jaegul; Lee, Joon Seok; Kim, Sookyung; Choi, Edward

doi:10.48550/arxiv.2010.08188

Cited by 6 publications

(7 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To the best of our knowledge, most existing works in computer vision process videos as discrete collections of frames. The only exception is Vid-ODE [27], which represent videos by continuous latent states. The latent state can be evaluated at any given timestamp, allowing the video to be rendered with an infinitely high frame rate.…”

Section: Video Representationmentioning

confidence: 99%

E-CIR: Event-Enhanced Continuous Intensity Recovery

Chen¹,

Huang²,

Bajaj³

2022

Preprint

View full text Add to dashboard Cite

A camera begins to sense light the moment we press the shutter button. During the exposure interval, relative motion between the scene and the camera causes motion blur, a common undesirable visual artifact. This paper presents E-CIR, which converts a blurry image into a sharp video represented as a parametric function from time to intensity. E-CIR leverages events as an auxiliary input. We discuss how to exploit the temporal event structure to construct the parametric bases. We demonstrate how to train a deep learning model to predict the function coefficients. To improve the appearance consistency, we further introduce a refinement module to propagate visual features among consecutive frames. Compared to state-of-the-art event-enhanced deblurring approaches, E-CIR generates smoother and more realistic results.

show abstract

Section: Video Representationmentioning

confidence: 99%

E-CIR: Event-Enhanced Continuous Intensity Recovery

Chen¹,

Huang²,

Bajaj³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In contrast to the prior work, our generator is continuous in time. In this way it is similar to Vid-ODE [47]: a continuous-time video interpolation and prediction model based on neural ODEs [13].…”

Section: Related Workmentioning

confidence: 99%

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

Skorokhodov¹,

Tulyakov²,

Elhoseiny³

2021

Preprint

View full text Add to dashboard Cite

Videos show continuous events, yet most -if not allvideo synthesis frameworks treat them discretely in time.In this work, we think of videos of what they should betime-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. For this, we first design continuous motion representations through the lens of positional embeddings. Then, we explore the question of training on very sparse videos and demonstrate that a good generator can be learned by using as few as 2 frames per clip. After that, we rethink the traditional image and video discriminators pair and propose to use a single hypernetwork-based one. This decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024 2 videos for the first time. We build our model on top of StyleGAN2 and it is just ≈5% more expensive to train at the same resolution while achieving almost the same image quality. Moreover, our latent space features similar properties, enabling spatial manipulations that our method can propagate in time. We can generate arbitrarily long videos at arbitrary high frame rate, while prior work struggles to generate even 64 frames at a fixed rate. Our model achieves state-of-the-art results on four modern 256 2 video synthesis benchmarks and one 1024 2 resolution one. 1

show abstract

“…Equipped with widely used numerical solvers such as Runge-Kutta and Dormand-Prince method, neural ODE has the capacity to express the latent state in continuous-depth, or equivalently continuous-time. The continuous nature of neural ODE paved a way to design the continuous time-series modeling as shown in following studies [7,9,24,27,41]. Latent ODE [27] introduced ODE-RNN as an encoder and demonstrated the effectiveness of handling the time-series data taken at non-uniform intervals.…”

Section: Related Workmentioning

confidence: 99%

“…Latent ODE [27] introduced ODE-RNN as an encoder and demonstrated the effectiveness of handling the time-series data taken at non-uniform intervals. Furthermore, ODE 2 VAE [41] and Vid-ODE [24] performed continuous-time video prediction conditioned on input video frames, demonstrating the potential to apply neural ODE to computer vision. given two noise vectors, the motion noise vector z m ∈ Z M and the appearance noise vector z a ∈ Z A , where T denotes the number of frames, H and W the height and width of the generated image, respectively.…”

Section: Related Workmentioning

confidence: 99%

Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE

Kim¹,

Park²,

Lee³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

In order to perform unconditional video generation, we must learn the distribution of the real-world videos. In an effort to synthesize high-quality videos, various studies attempted to learn a mapping function between noise and videos, including recent efforts to separate motion distribution and appearance distribution. Previous methods, however, learn motion dynamics in discretized, fixed-interval timesteps, which is contrary to the continuous nature of motion of a physical body. In this paper, we propose a novel video generation approach that learns separate distributions for motion and appearance, the former modeled by neural Ordinary Differential Equation (ODE) to learn natural motion dynamics. Specifically, we employ a two-stage approach where the first stage converts a noise vector to a sequence of keypoints in arbitrary frame rates, and the second stage synthesizes videos based on the given keypoints sequence and the appearance noise vector. Our model not only quantitatively outperforms recent baselines for video generation, but also demonstrates versatile functionality such as dynamic frame rate manipulation and motion transfer between two datasets, thus opening new doors to diverse video generation applications.* indicates equal contribution. † This work was done during an internship at Kakao Enterprise.

show abstract

Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation

Cited by 6 publications

References 20 publications

E-CIR: Event-Enhanced Continuous Intensity Recovery

E-CIR: Event-Enhanced Continuous Intensity Recovery

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE

Contact Info

Product

Resources

About