search
HomeTechnology peripheralsAISimplify Vincent diagram prompt, LLM model generates high-quality images

The diffusion model has become a mainstream text-to-image generation model, which can guide the generation of high-quality and content-rich images through text prompts

If the input prompts are too Simplicity, existing models have limitations in semantic understanding and common sense reasoning, which will lead to a significant decrease in the quality of the generated images

Lin Liang’s team from the HCP Laboratory of Sun Yat-sen University proposed a A simple yet effective fine-tuning method called SUR-adapter is designed to improve the model's ability to understand narrative cues. This method is a semantic understanding and inference adapter for pre-trained diffusion models and is parameter-efficient

Simplify Vincent diagram prompt, LLM model generates high-quality images

Please click the link below View the paper: https://arxiv.org/abs/2305.05189

Open source address: https://github.com/Qrange-group/SUR-adapter

To achieve this goal, the researchers first collected and annotated a dataset called SURD. This dataset contains over 57,000 multi-modal samples, each containing a simple narrative prompt, a complex keyword-based prompt, and a high-quality image

Researchers align the semantic representation of narrative prompts with complex prompts and transfer the knowledge of large language models (LLM) to SUR adapters through knowledge distillation to be able to obtain strong semantic understanding and reasoning capabilities to build high-quality texts Semantic representations for text-to-image generation. Then, they aligned the semantic representation of the narrative prompts with the complex prompts and transferred the knowledge of the large language model (LLM) to the SUR adapter through knowledge distillation to be able to obtain strong semantic understanding and reasoning capabilities to build high-quality textual semantic representations. For text-to-image generation

Simplify Vincent diagram prompt, LLM model generates high-quality images

We experimented by integrating multiple LLMs and pre-trained diffusion models and found that this method can effectively make the diffusion model Understand and reason about concise natural language descriptions without degrading image quality

This approach can make text-to-image diffusion models easier to use, provide a better user experience, and further Advancing the development of user-friendly text-to-image generative models and bridging the semantic gap between simple narrative prompts and keyword-based prompts

Background Introduction

Currently, the text-to-image pre-training model represented by stable diffusion has become one of the most important basic models in the field of artificial intelligence generated content, playing an important role in tasks such as image editing, video generation, and 3D object generation. Important role

At present, the semantic ability of these pre-trained diffusion models mainly depends on the text encoder (such as CLIP), and its semantic understanding ability directly affects the generation effect of the diffusion model

This article first tests the image-text matching accuracy of Stable diffusion by constructing common question categories in the visual question answering task (VQA), such as "counting", "color" and "action". We will manually count and conduct testing

The following are examples of constructing various prompts, see the table below for details

Simplify Vincent diagram prompt, LLM model generates high-quality images

According to the results shown in the table below, the article reveals that the current Vincent graph pre-trained diffusion model has serious semantic understanding problems. The image-text matching accuracy of a large number of questions is less than 50%, and even in some questions, the accuracy is only 0%

Simplify Vincent diagram prompt, LLM model generates high-quality images

In order to obtain matching text To generate conditioned images, we need to find ways to enhance the semantic capabilities of our encoder in the pre-trained diffusion model

Overview of methods

Rewritten content: 1. Data preprocessing

First, we can start from the commonly used diffusion model online websites lexica.art, civitai.com and stablediffusionweb Get a large number of image-text pairs. Then, we need to clean and filter these data to obtain more than 57,000 high-quality triplet data (including complex prompts, simple prompts, and pictures) and form it into a SURD dataset

Simplify Vincent diagram prompt, LLM model generates high-quality images

As shown in the figure below, complex prompts refer to the text prompt conditions required by the diffusion model when generating images. Usually these prompts have complex formats and descriptions. A simple prompt is a text description of an image generated through BLIP. It uses a language format that is consistent with human description

Generally speaking, a simple prompt that is consistent with normal human language description is difficult for the diffusion model to generate Images that are semantically adequate, and complex cues (what users jokingly call the diffusion model's "mantra") can achieve satisfactory results

#What needs to be rewritten is :2. Semantic distillation of large language models

This article introduces a method of using the Adapter of the Transformer structure to distill the semantic features of large language models in specific hidden layers. And by linearly combining the large language model information guided by the Adapter with the semantic features output by the original text encoder, the final semantic features are obtained

The large language model uses LLaMA of different sizes model, and the parameters of the UNet part of the diffusion model are frozen during the entire training process

Simplify Vincent diagram prompt, LLM model generates high-quality images

The content that needs to be rewritten is :3. Image quality restoration

In order to keep the original meaning unchanged, the content needs to be rewritten into Chinese: Since the structure of this article introduces learnable modules in the pre-training large model inference process, it destroys the original image generation quality of the pre-training model to a certain extent. Therefore, it is necessary to bring the quality of image generation back to the generation quality level of the original pre-training model

Simplify Vincent diagram prompt, LLM model generates high-quality images

This paper uses triples in the SURD dataset and introduces the corresponding quality loss function during the training process to restore the quality of image generation. Specifically, this article hopes that the semantic features obtained through the new module can be as aligned as possible with the semantic features of complex cues

The following figure shows the effect of SUR-adapter on the pre-trained diffusion model fine-tuning framework. The right side is the network structure of the Adapter

Simplify Vincent diagram prompt, LLM model generates high-quality images

Experimental results

##For SUR-adapter Performance, this article analyzes the performance from two aspects: semantic matching and image quality

On the one hand, according to the following table, SUR-adapter can effectively solve common semantics in the Vincentian graph diffusion model Mismatch problem,applicable to different experimental settings. Under different categories of semantic criteria, the accuracy has also been improved to a certain extent

On the other hand, this paper uses common image quality evaluation indicators such as BRISQUE to compare the original pretrain diffusion model and After using the SUR-adapter diffusion model to perform statistical tests on the quality of the images generated, we can find that there is no significant difference between the two.

We also conducted a human preference questionnaire test

Through the above analysis, it can be concluded that the proposed method can Alleviating the inherent image-text mismatch problem from pre-trained text to image while maintaining the quality of image generation

Simplify Vincent diagram prompt, LLM model generates high-quality images

Simplify Vincent diagram prompt, LLM model generates high-quality images

We can also demonstrate qualitatively through the following image generated examples. For more detailed analysis and details, please refer to this article and the open source warehouse

The content that needs to be rewritten is:

Simplify Vincent diagram prompt, LLM model generates high-quality images

Simplify Vincent diagram prompt, LLM model generates high-quality images

Introduction to HCP Laboratory

Professor Lin Liang founded Sun Yat-sen University’s Human-Machine-Object Intelligent Fusion Laboratory (HCP Lab) in 2010. In recent years, the laboratory has achieved rich academic results in the fields of multimodal content understanding, causal and cognitive reasoning, and embodied intelligence. The laboratory has won many domestic and foreign science and technology awards and best paper awards, and is committed to developing product-level artificial intelligence technology and platforms

The above is the detailed content of Simplify Vincent diagram prompt, LLM model generates high-quality images. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
ai合并图层的快捷键是什么ai合并图层的快捷键是什么Jan 07, 2021 am 10:59 AM

ai合并图层的快捷键是“Ctrl+Shift+E”,它的作用是把目前所有处在显示状态的图层合并,在隐藏状态的图层则不作变动。也可以选中要合并的图层,在菜单栏中依次点击“窗口”-“路径查找器”,点击“合并”按钮。

ai橡皮擦擦不掉东西怎么办ai橡皮擦擦不掉东西怎么办Jan 13, 2021 am 10:23 AM

ai橡皮擦擦不掉东西是因为AI是矢量图软件,用橡皮擦不能擦位图的,其解决办法就是用蒙板工具以及钢笔勾好路径再建立蒙板即可实现擦掉东西。

谷歌超强AI超算碾压英伟达A100!TPU v4性能提升10倍,细节首次公开谷歌超强AI超算碾压英伟达A100!TPU v4性能提升10倍,细节首次公开Apr 07, 2023 pm 02:54 PM

虽然谷歌早在2020年,就在自家的数据中心上部署了当时最强的AI芯片——TPU v4。但直到今年的4月4日,谷歌才首次公布了这台AI超算的技术细节。论文地址:https://arxiv.org/abs/2304.01433相比于TPU v3,TPU v4的性能要高出2.1倍,而在整合4096个芯片之后,超算的性能更是提升了10倍。另外,谷歌还声称,自家芯片要比英伟达A100更快、更节能。与A100对打,速度快1.7倍论文中,谷歌表示,对于规模相当的系统,TPU v4可以提供比英伟达A100强1.

ai可以转成psd格式吗ai可以转成psd格式吗Feb 22, 2023 pm 05:56 PM

ai可以转成psd格式。转换方法:1、打开Adobe Illustrator软件,依次点击顶部菜单栏的“文件”-“打开”,选择所需的ai文件;2、点击右侧功能面板中的“图层”,点击三杠图标,在弹出的选项中选择“释放到图层(顺序)”;3、依次点击顶部菜单栏的“文件”-“导出”-“导出为”;4、在弹出的“导出”对话框中,将“保存类型”设置为“PSD格式”,点击“导出”即可;

ai顶部属性栏不见了怎么办ai顶部属性栏不见了怎么办Feb 22, 2023 pm 05:27 PM

ai顶部属性栏不见了的解决办法:1、开启Ai新建画布,进入绘图页面;2、在Ai顶部菜单栏中点击“窗口”;3、在系统弹出的窗口菜单页面中点击“控制”,然后开启“控制”窗口即可显示出属性栏。

GPT-4的研究路径没有前途?Yann LeCun给自回归判了死刑GPT-4的研究路径没有前途?Yann LeCun给自回归判了死刑Apr 04, 2023 am 11:55 AM

Yann LeCun 这个观点的确有些大胆。 「从现在起 5 年内,没有哪个头脑正常的人会使用自回归模型。」最近,图灵奖得主 Yann LeCun 给一场辩论做了个特别的开场。而他口中的自回归,正是当前爆红的 GPT 家族模型所依赖的学习范式。当然,被 Yann LeCun 指出问题的不只是自回归模型。在他看来,当前整个的机器学习领域都面临巨大挑战。这场辩论的主题为「Do large language models need sensory grounding for meaning and u

强化学习再登Nature封面,自动驾驶安全验证新范式大幅减少测试里程强化学习再登Nature封面,自动驾驶安全验证新范式大幅减少测试里程Mar 31, 2023 pm 10:38 PM

引入密集强化学习,用 AI 验证 AI。 自动驾驶汽车 (AV) 技术的快速发展,使得我们正处于交通革命的风口浪尖,其规模是自一个世纪前汽车问世以来从未见过的。自动驾驶技术具有显着提高交通安全性、机动性和可持续性的潜力,因此引起了工业界、政府机构、专业组织和学术机构的共同关注。过去 20 年里,自动驾驶汽车的发展取得了长足的进步,尤其是随着深度学习的出现更是如此。到 2015 年,开始有公司宣布他们将在 2020 之前量产 AV。不过到目前为止,并且没有 level 4 级别的 AV 可以在市场

ai移动不了东西了怎么办ai移动不了东西了怎么办Mar 07, 2023 am 10:03 AM

ai移动不了东西的解决办法:1、打开ai软件,打开空白文档;2、选择矩形工具,在文档中绘制矩形;3、点击选择工具,移动文档中的矩形;4、点击图层按钮,弹出图层面板对话框,解锁图层;5、点击选择工具,移动矩形即可。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!