


Google is the first to release video generation AIGC, netizens: you can customize movies
We know that advances in generative models and multimodal visual language models have paved the way for large-scale text-to-image models with unprecedented generative realism and diversity. These models offer new creative processes, but are limited to compositing new images rather than editing existing ones. To bridge this gap, intuitive text-based editing methods can perform text-based editing of generated and real images and preserve some of the original properties of these images. Similar to images, many text-to-video models have been proposed recently, but there are few methods using these models for video editing.
In text-guided video editing, the user provides an input video along with a text prompt that describes the expected properties of the generated video, as shown in Figure 1 below. The goals have the following three aspects, 1) Alignment, the edited video should conform to the input text prompt; 2) Fidelity, the edited video should retain the content of the original video, 3) Quality, the edited video should have high quality .
As you can see, video editing is more challenging than image editing, it requires synthesizing new actions rather than just modifying the visual appearance. There is also a need to maintain temporal consistency. Therefore, applying image-level editing methods such as SDEdit and Prompt-to-Prompt to video frames is not enough to achieve good results.
##In a paper recently published by Google Research and others on arXiv, Researchers proposed a new method, Dreamix, which was inspired by UniTune and applied the text conditional video diffusion model (VDM) to video editing.
- ##Paper address: https://arxiv.org/ pdf/2302.01329.pdf
- Project homepage: https://dreamix-video-editing.github.io/
The text conditional VDM maintains high fidelity to the input video through the following two main ideas . One does not use pure noise as model initialization, but uses a downgraded version of the original video to retain only low spatiotemporal information by reducing the size and adding noise; the other is to further improve the fidelity of the original video by fine-tuning the generative model on the original video Spend.
Fine-tuning ensures that the model understands the high-resolution properties of the original video. Simple fine-tuning of the input video contributes to relatively low motion editability because the model learns to prefer raw motion rather than following text prompts. We propose a novel hybrid fine-tuning method in which the VDM is also fine-tuned on a set of individual frames of the input video and discards their timing. Blend fine-tuning significantly improves the quality of motion editing.The researchers further used their video editing model to propose a
new image animation framework, as shown in Figure 2 below. The framework consists of several steps, such as animating objects and backgrounds in images, creating dynamic camera movements, and more. They do this through simple image processing operations such as frame copying or geometric image transformations, creating crude videos. Then use the Dreamix video editor to edit the video. In addition, the researchers also used their fine-tuning method for goal-driven video generation, which is the video version of Dreambooth.
For this Google study, it was stated that 3D motion and editing tools Might be a popular topic for the next wave of papers.
Someone else said: You can soon make your own movie on a budget, all you need is a green screen and this technology:
Overview of Method
This article proposes a new method for video editing, specifically :
Text-guided video editing by reverse engineering destroyed videos
They use cascade VDM (Video Diffusion Models ), first destroy the input video to a certain extent through downsampling, and then add noise. Next a cascade diffusion model is used for the sampling process and conditional on time t to upscale the video to the final temporal-spatial resolution.
In the process of destroying the input video, you first need to perform a downsampling operation to obtain the basic model (16 frames 24 × 40), and then add The variance is Gaussian noise, further corrupting the input video.
For the above processed video, the next step is to use cascaded VDM to map the damaged low-resolution video to a high-resolution video aligned with the text . The core idea here is that given a noisy, very low temporal and spatial resolution video, there are many perfectly feasible, high-resolution videos corresponding to it. The basic model in this paper starts from a corrupted video, which has the same noise as the diffusion process at time s. The study then used VDM to reverse the diffusion process until time 0. Finally, the video is upgraded through the super-resolution model.
Hybrid video image fine-tuning
Using only the input video to fine-tune the video diffusion model will limit the changes in object motion. Instead, this study uses a hybrid target, that is, in addition to the original target (lower left corner), this paper also performs fine-tuning on an unordered set of frames. This is done through "masked temporal attention" to prevent temporal attention. Forces and convolutions are fine-tuned (bottom right). This operation allows adding motion to static videos.
Inference
In the application Based on pre-processing (Application Dependent Pre-processing, left in the figure below), this research supports multiple applications and can convert input content into a unified video format. For image-to-video, the input image is copied and transformed, synthesizing a rough video with some camera motion; for object-driven video generation, its input is omitted and fine-tuned separately to maintain fidelity. This rough video was then edited using the Dreamix Video Editor (right): as mentioned earlier, the video was first destroyed by downsampling, adding noise. A fine-tuned text-guided video diffusion model is then applied to upscale the video to its final temporal and spatial resolution.
Video editing: In the picture below, Dreamix changes the action to dance, and the appearance changes from a monkey to a bear, But the basic attributes of the subject in the video have not changed:
Image to video: When the input is an image, Dreamix can add new moving objects using its video prior, as follows A unicorn appears in a foggy forest and is zoomed in.
Penguins appeared next to the hut:
Goal-driven video generation: Dreamix can also take a collection of images showing the same subject and generate a new video with that subject as a moving object. The picture below shows a caterpillar wriggling on a leaf:
In addition to qualitative analysis, the study also conducted baseline comparisons, mainly using Dreamix Compare with two baseline methods: Imagen-Video and Plug-and-Play (PnP). The following table shows the scoring results:
Figure 8 shows a video edited by Dreamix and two baseline examples: text to The video model enables low-fidelity editing because it is not conditioned on the original video. PnP preserves the scene but lacks consistency from frame to frame; Dreamix performs well on all three goals.
Please refer to the original paper for more technical details.
The above is the detailed content of Google is the first to release video generation AIGC, netizens: you can customize movies. For more information, please follow other related articles on the PHP Chinese website!
![[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyright](https://img.php.cn/upload/article/001/242/473/174707263295098.jpg?x-oss-process=image/resize,p_40)
The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.

Have you heard of a framework called the "Fukatsu Prompt System"? Language models such as ChatGPT are extremely excellent, but appropriate prompts are essential to maximize their potential. Fukatsu prompts are one of the most popular prompt techniques designed to improve output accuracy. This article explains the principles and characteristics of Fukatsu-style prompts, including specific usage methods and examples. Furthermore, we have introduced other well-known prompt templates and useful techniques for prompt design, so based on these, we will introduce C.

ChatGPT Search: Get the latest information efficiently with an innovative AI search engine! In this article, we will thoroughly explain the new ChatGPT feature "ChatGPT Search," provided by OpenAI. Let's take a closer look at the features, usage, and how this tool can help you improve your information collection efficiency with reliable answers based on real-time web information and intuitive ease of use. ChatGPT Search provides a conversational interactive search experience that answers user questions in a comfortable, hidden environment that hides advertisements

In a modern society with information explosion, it is not easy to create compelling articles. How to use creativity to write articles that attract readers within a limited time and energy requires superb skills and rich experience. At this time, as a revolutionary writing aid, ChatGPT attracted much attention. ChatGPT uses huge data to train language generation models to generate natural, smooth and refined articles. This article will introduce how to effectively use ChatGPT and efficiently create high-quality articles. We will gradually explain the writing process of using ChatGPT, and combine specific cases to elaborate on its advantages and disadvantages, applicable scenarios, and safe use precautions. ChatGPT will be a writer to overcome various obstacles,

An efficient guide to creating charts using AI Visual materials are essential to effectively conveying information, but creating it takes a lot of time and effort. However, the chart creation process is changing dramatically due to the rise of AI technologies such as ChatGPT and DALL-E 3. This article provides detailed explanations on efficient and attractive diagram creation methods using these cutting-edge tools. It covers everything from ideas to completion, and includes a wealth of information useful for creating diagrams, from specific steps, tips, plugins and APIs that can be used, and how to use the image generation AI "DALL-E 3."

Unlock ChatGPT Plus: Fees, Payment Methods and Upgrade Guide ChatGPT, a world-renowned generative AI, has been widely used in daily life and business fields. Although ChatGPT is basically free, the paid version of ChatGPT Plus provides a variety of value-added services, such as plug-ins, image recognition, etc., which significantly improves work efficiency. This article will explain in detail the charging standards, payment methods and upgrade processes of ChatGPT Plus. For details of OpenAI's latest image generation technology "GPT-4o image generation" please click: Detailed explanation of GPT-4o image generation: usage methods, prompt word examples, commercial applications and differences from other AIs Table of contents ChatGPT Plus Fees Ch

How to use ChatGPT to streamline your design work and increase creativity This article will explain in detail how to create a design using ChatGPT. We will introduce examples of using ChatGPT in various design fields, such as ideas, text generation, and web design. We will also introduce points that will help you improve the efficiency and quality of a variety of creative work, such as graphic design, illustration, and logo design. Please take a look at how AI can greatly expand your design possibilities. table of contents ChatGPT: A powerful tool for design creation


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Dreamweaver CS6
Visual web development tools

WebStorm Mac version
Useful JavaScript development tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),
