Gen-1
The rise of video-centric platforms has led to a high demand for user-friendly and effective video editing tools. However, editing video data is still challenging and time-consuming due to its temporal nature. Modern machine learning models have shown promise in enhancing editing, but they often compromise spatial detail and temporal consistency. Recently, the emergence of potent diffusion models trained on huge datasets has caused a sharp increase in the quality and popularity of generative techniques for picture synthesis. These techniques allow simple users to produce detailed pictures using text-conditioned models with only a text prompt as input.
To address the challenges in video editing, researchers have proposed a content-aware video diffusion model that is trained on a sizable dataset of paired text-image data and uncaptioned movies. The method uses monocular depth estimations to represent structure and pre-trained neural networks to anticipate embeddings to represent content. By applying an information-obscuring technique to the structure representation and a unique guiding technique influenced by classifier-free guidance to regulate temporal consistency, the method provides potent controls on the creative process.
The proposed method extends latent diffusion models to video production by adding temporal layers to an image model that has already been trained and training on pictures and videos. The complete editing procedure is done at inference time without further per-video training or pre-processing. The method exhibits complete mastery of consistency in terms of time, substance, and structure, and it enables inference-time control over temporal consistency by concurrently training on image and video data. The researchers demonstrate in user research that their technique is preferable over several alternative approaches. They also show how the trained model may be further modified to produce more accurate movies of a particular subject by focusing on a small group of photos. Interactive demos and more details are available on their project website.