Rumah >Tutorial Perkakasan >Kajian perkakasan >Spider-Man menari dengan menggoda, dan generasi ControlNet akan datang! Dilancarkan oleh pasukan Jiajiaya, ia adalah plug-and-play dan juga boleh mengawal penjanaan video

Spider-Man menari dengan menggoda, dan generasi ControlNet akan datang! Dilancarkan oleh pasukan Jiajiaya, ia adalah plug-and-play dan juga boleh mengawal penjanaan video

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBasal: 2024-08-17 15:49:41837semak imbas

Gunakan kurang daripada 10% parameter latihan untuk mencapai penjanaan terkawal yang sama seperti ControlNet!

Dan model biasa keluarga Stable Diffusion seperti SDXL dan SD1.5 boleh disesuaikan, dan ia masih plug-and-play.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

Pada masa yang sama, ia juga boleh digunakan dengan SVD untuk mengawal penjanaan video, dan butiran pergerakan boleh dikawal dengan tepat hingga ke jari.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

Di sebalik imej dan video ini ialah ControlNeXt, alat panduan penjanaan imej/video sumber terbuka yang dilancarkan oleh pasukan Jiajiaya Cina Hong Kong.

Anda boleh lihat daripada nama bahawa pasukan R&D telah meletakkannya sebagai ControlNet generasi akan datang.

Seperti karya klasik ResNeXt (sambungan ResNet) oleh tuhan agung He Kaiming dan Xie Saining, namanya juga berdasarkan kaedah ini.

Sesetengah netizen percaya bahawa nama ini sangat layak dan ia sememangnya produk generasi akan datang, meningkatkan ControlNet ke tahap yang lebih tinggi.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

Sesetengah orang juga terus terang mengatakan bahawa ControlNeXt ialah pengubah permainan, yang telah meningkatkan kecekapan penjanaan terkawal dengan banyaknya. Mereka tidak sabar untuk melihat karya yang dihasilkan oleh orang yang menggunakannya.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

Spider-Man menari dengan keindahan

ControlNeXt menyokong pelbagai model siri SD dan plug-and-play.

Ia termasuk model penjanaan imej SD1.5, SDXL, SD3 (menyokong Resolusi Super), dan model penjanaan video SVD.

Tak banyak nak cakap, mari kita lihat hasilnya.

Anda boleh melihat bahawa dengan menambah panduan tepi (Canny) dalam SDXL, gadis dua dimensi yang dilukis dan garis kawalan muat hampir sempurna.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

Walaupun kontur kawalan banyak dan terperinci, model masih boleh melukis gambar yang memenuhi keperluan.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

Dan disepadukan dengan lancar dengan pemberat LoRA lain tanpa latihan tambahan.

Sebagai contoh, dalam SD1.5, anda boleh menggunakan keadaan kawalan postur (Pose) dengan pelbagai LoRA untuk membentuk watak dengan gaya berbeza atau merentas dimensi, tetapi dengan tindakan yang sama.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

Selain itu, ControlNeXt juga menyokong mod topeng dan kawalan kedalaman.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

SD3 juga menyokong Resolusi Super, yang boleh menjana imej definisi ultra tinggi.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

Semasa penjanaan video, ControlNeXt boleh mengawal pergerakan watak.

Sebagai contoh, biarkan Spider-Man menari tarian yang indah dalam TikTok, malah pergerakan jari pun ditiru dengan agak tepat. .

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

Dan berbanding dengan ControlNet yang asal, ControlNeXt memerlukan lebih sedikit parameter latihan dan menumpu lebih cepat.

Sebagai contoh, dalam SD1.5 dan SDXL, ControlNet masing-masing memerlukan 361 juta dan 1.251 bilion parameter boleh dipelajari, tetapi ControlNeXt hanya memerlukan 30 juta dan 108 juta masing-masing, kurang daripada 10% daripada ControlNet. 蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

During the training process, ControlNeXt is close to convergence at around 400 steps, but ControlNet requires ten times or even dozens of times the number of steps.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

The generation speed is also faster than ControlNet. On average, ControlNet is equivalent to the basic model, which brings 41.9% delay, but ControlNeXt only has 10.4%.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

So, how is ControlNeXt implemented, and what improvements have been made to ControlNet?

A more lightweight conditional control module

First, use a picture to understand the entire workflow of ControlNeXt.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

The key to lightweighting is that ControlNeXt removes the huge control branch in ControlNet and instead introduces a lightweight convolution module composed of a small number of ResNet blocks.

This module is responsible for extracting feature representations of control conditions (such as semantic segmentation masks, key point priors, etc.).

The amount of training parameters is usually less than 10% of the pre-trained model in ControlNet, but it can still learn the input conditional control information well. This design greatly reduces the computational overhead and memory usage.

Specifically, it samples at equal intervals from different network layers of a pre-trained model to form a subset of parameters used for training, while the remaining parameters are frozen.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

In addition, when designing the architecture of ControlNeXt, the research team also maintained the consistency of the model structure with the original architecture, thus achieving plug-and-play.

Whether it is ControlNet or ControlNeXt, the injection of conditional control information is an important step.

During this process, the ControlNeXt research team conducted in-depth research on two key issues - the selection of injection locations and the design of injection methods.

The research team observed that in most controllable generation tasks, the form of conditional information guiding generation is relatively simple and highly correlated with the features in the denoising process.

So the team believed that it was not necessary to inject control information into every layer of the denoising network, so they chose to aggregate conditional features and denoising features only in the middle layer of the network.

The aggregation method is also as simple as possible - after aligning the distributions of the two sets of features using cross normalization, add them directly.

This not only ensures that the control signal affects the denoising process, but also avoids the introduction of additional learning parameters and instability by complex operations such as the attention mechanism.

The cross normalization is also another core technology of ControlNeXt, replacing the previously commonly used progressive initialization strategies such as zero-convolution.

Traditional methods alleviate the collapse problem by gradually releasing the influence of new modules from scratch, but the result is often slow convergence.

Cross normalization directly uses the mean μ and variance σ of the backbone network denoising features to normalize the features output by the control module, so that the data distribution of the two is as aligned as possible.

蜘蛛侠妖娆起舞，下一代 ControlNet 来了！贾佳亚团队推出，即插即用，还能控制视频生成

(Note: is a small constant added for numerical stability, γ is a scaling parameter.)

The normalized control features are then adjusted in amplitude and baseline through scale and offset parameters, and then combined with denoising Feature addition not only avoids the sensitivity of parameter initialization, but also allows control conditions to take effect in the early stages of training to speed up the convergence process.

In addition, ControlNeXt also uses the control module to learn the mapping of condition information to latent space features, making it more abstract and semantic, and more conducive to generalization to unseen control conditions.

Project homepage:

https://pbihao.github.io/projects/controlnext/index.html

Paper address:

https://arxiv.org/abs/2408.06070

GitHub:

https: //github.com/dvlab-research/ControlNeXt

Atas ialah kandungan terperinci Spider-Man menari dengan menggoda, dan generasi ControlNet akan datang! Dilancarkan oleh pasukan Jiajiaya, ia adalah plug-and-play dan juga boleh mengawal penjanaan video. Untuk maklumat lanjut, sila ikut artikel berkaitan lain di laman web China PHP!

架构 html github stable diffusion https

Kenyataan：

Kandungan artikel ini disumbangkan secara sukarela oleh netizen, dan hak cipta adalah milik pengarang asal. Laman web ini tidak memikul tanggungjawab undang-undang yang sepadan. Jika anda menemui sebarang kandungan yang disyaki plagiarisme atau pelanggaran, sila hubungi admin@php.cn

Artikel sebelumnya：Model besar mempunyai pemahaman bahasa mereka sendiri! Kertas MIT mendedahkan "proses pemikiran" model besarArtikel seterusnya：Model besar mempunyai pemahaman bahasa mereka sendiri! Kertas MIT mendedahkan "proses pemikiran" model besar

Artikel berkaitan

Lihat lagi