The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com
Introduction to the author: Song Yiren: PhD candidate at ShowLab, National University of Singapore. His main research directions include image and video generation, and AI security.
Huang Shijie: A second-year master's student at the National University of Singapore. Currently working as an algorithm engineer intern at Tiamat AI. His main research direction is visual generation. Currently looking for PhD admissions for 2025 fall.
Recently, lvmin brought the latest model Paints-UNDO. This AI generation tool can restore the entire painting process based on pictures, and the entire AIGC community is shocked. Demo of Paints-UNDO. As early as 1 month ago, NUS, SJTU, Tiamat and other institutions jointly released a work on similar tasks: ProcessPainter: Learn Painting Process from Sequence Data. The Paints-UNDO technical report has not been released yet, let’s take a look at how ProcessPainter implements it!
Paper title: ProcessPainter: Learn Painting Process from Sequence DataPaper link: https://arxiv.org/pdf/2406.06062
-
Code link: https://github.com /nicolaus-huang/ProcessPainter
- Open any painting teaching book and you will see step-by-step instructions for painting. However, in the era of generative AI, image generation through the denoising process is completely different from the painting process of a human painter. The AI painting process cannot be directly used for painting teaching.
To solve this problem, ProcessPainter enables the diffusion model to generate the painting process for the first time by training the temporal model on synthetic data and human painter painting videos. In addition, the painting processes of different themes and painters vary greatly, and their styles are very different. However, there are currently very few studies that have taken the painting process as an object of study. Based on the pre-trained Motion Model, the author of the paper learned the artist's painting techniques by training Motion LoRA on a small number of painting sequences of a specific artist. In-depth interpretation of the core technology of ProcessPainter
1. Temporal Attention Mechanism (Temporal Attention) Using temporal attention to learn to generate a painting process is the core innovation of ProcessPainter. The key to generating a painting sequence is that the entire sequence is the change process of the same picture from abstract to concrete, and the previous and later frames are consistent and relevant in content and composition. To achieve this goal, the authors introduced the temporal attention module from AnimateDiff to Unet. This module is located after each diffusion layer and absorbs information from different frames through the inter-frame self-attention mechanism to ensure smooth transition and continuity of the entire sequence. Experiments have proven that this training strategy can maintain consistent painting effects between frames. The difference between the painting process generation and video generation tasks is that the changes before and after the painting process are more drastic. The first frame is a color block or line drawing with a low degree of completion, while the last frame is a complete painting, which poses a challenge to model training. To this end, the author of the paper first pre-trained the timing module on a large number of synthetic data sets, allowing the model to learn the step-by-step painting process of various SBR (Stroke-based rendering) methods, and then used the painting process data of dozens of artists to train Painting LoRA Model. 2. Artwork Replication Network In painting practice, we would rather know how a work is painted, and how to continue to refine it from a semi-finished painting to achieve Expected finished product effect. This leads to two tasks: reconstruction and completion of the painting process. Given that both tasks have image input, the author of the paper proposed the Artwork Replication Network. This network design can handle image input of any frame and flexibly control the generation of the painting process. Similar to previous controllable generation methods, the authors of the paper introduce a variant of ControlNet to control specific frames in the generated results to be consistent with the reference image. 3. Synthetic data set and training strategy Since real painting process data is difficult to obtain, the amount is not enough to support large-scale training. To this end, the authors of the paper constructed a synthetic data set for pre-training. Three synthetic data methods are specifically used: 1. Use Learn to Paint to generate a painting sequence of translucent Bezier curve strokes; 2. Use Neural style to customize the strokes painting generates painting sequences in oil painting style and Chinese painting style. 3. The above-mentioned SBR (Stroke base painting) method is to fit a target image from coarse to fine, which means that the already painted parts are allowed to be overwritten and modified. However, many types of painting, such as Chinese painting and sculpture, due to the material Due to the restrictions, the completed parts cannot be significantly modified, and the painting process is completed in separate areas. To this end, the author of the paper uses SAM (segment anything) and saliency detection methods to add content from the blank canvas to sub-regions one by one, first draw the salient objects, and then gradually diffuse them to the background to synthesize a video of the painting process. In the training phase, the author of the paper first pre-trained the Motion Model on the synthetic data set, then froze the parameters of the Motion Model and trained the Artwork Replication Network. When fine-tuning the painting LoRA model, the first step is to fine-tune the spatial attention LoRA using only the final frames to prevent the half-finished painting training set from harming the model's generation quality. After that, the authors of the paper froze the parameters of the spatial attention LoRA and fine-tuned the temporal attention LoRA using the complete painting sequence. During the inference phase, when generating painting sequences from text, ProcessPainter does not use the artwork replication network. In the task of painting process reconstruction and completion, ProcessPainter uses an artwork replication network to receive frame-specific reference input. To ensure that the frames in the generated painting sequence match the input image as closely as possible, ProcessPainter employs a DDIM inversion technique to obtain the initial noise of the reference image and replace the initial noise of the specific frame in UNet. ProcessPainter effect displayThe ProcessPainter base model trained on the synthetic data set can generate painting sequences with stylistic differences in the process.
By training Motion Lora individually on the painting sequences of a small number of human painters, ProcessPainter can learn the painting process and style of a specific artist.
Specify a reference image, and ProcessPainter can reversely deconstruct the completed artwork into painting steps, or deduce a complete painting from a semi-finished product.
The combination of these technical components allows ProcessPainter to not only generate painting processes from text, but also convert reference images into painting sequences or complete unfinished paintings. This undoubtedly provides new tools for art education, and also opens up a new track for the AIGC community. Perhaps in the near future, there will be various Lora on Civitai that simulate the painting process of human painters. For more details, please read the original paper or visit the Github project homepage. The above is the detailed content of Just one picture can 'restore' the painting process. This paper was realized earlier than the popular Paints-UNDO. For more information, please follow other related articles on the PHP Chinese website!
Statement:The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn