search
HomeTechnology peripheralsAIGenerate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes

There are many methods of high-quality image editing, but it is difficult to accurately express the real physical world.

So, try Edit the World.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenesPicture

Peking University, Tiamat AI, Tiangong AI, and Mila Labs proposed EditWorld, which introduced a new editing task, namely World-instructed image editing. It defines and categorizes instructions based on various world scenarios.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenesPicture

With the support of a set of pre-trained models, such as GPT-3.5, Video-LLava and SDXL, a world command is built multimodal data set.

A diffusion-based image editing model EditWorld was trained on this data set, and the result was that the performance on its new task was significantly better than the existing editing methods, achieving SOTA.

Image Editing New SOTA

Existing methods achieve high-quality image editing through a variety of ways, including but not limited to text control, dragging operations, and inpainting. Among them, the method of editing using instructions has received widespread attention due to its ease of use.

Although image editing methods are capable of producing high-quality results, they still have difficulties in handling world dynamics that convey true visual dynamics in the physical world.

As shown in Figure 1, neither InstructPix2pix nor MagicBrush can generate reasonable editing results.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenesPicture

To solve this problem, the team introduced a new task called world-instructed image editing to enable image editing to reflect “World Dynamics” in the Real Physical World and Virtual Media.

Specifically, they defined and classified various world dynamic instructions and created a new multi-modal training dataset based on these instructions, which contains a large number of input-instruction-output triples Group.

Finally, the team trained a text-guided diffusion model using a carefully crafted dataset and proposed a zero-shot image manipulation strategy to achieve world-instructed image editing.

Based on task scenarios in the real world and virtual media, world-instructed image editing is divided into 7 categories, each category is defined and introduced, and a data sample is provided.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenesPicture

The team then designed two branches: text-to-picture generation and video storyboard extraction to obtain the data set.

The text generation image branch is to enrich the richness of the data scene. Under this branch, the team first uses GPT to generate text quadruples (including input image description, instruction, output image description and keywords), and then Use the input and output descriptions to generate pictures corresponding to the text, and use the attention map corresponding to the keyword to locate the editing position and obtain the editing mask. At the same time, in order to ensure the consistency of the key features of the two pictures, the team introduced the method of image prompt adaption. IP-Adapter. Finally, the team used IP-Adapter and ControlNet, combined with the canny map of the output image and the image prompt feature of the input image, and used Image Inpainting to adjust the output image to obtain more effective editing data.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenesPicture

After using the text generation picture branch to obtain scene-rich data, in order to add real data to the data set, the team extracted high-quality data from the video keyframes as editing data. Specifically, the team extracted two frames with strong correlation and large structural differences from the video storyboard as the starting and last frames, and cut out a new storyboard, and used a large multi-modal model to change the storyboard. After describing, the team finally used the start and end frames as the input image and output image, and used the obtained description as the instruction, thus obtaining the required editing data.

Going a step further, the team uses manual rechecking of the generated data to further improve data quality.

The team used the data set to finetune the InstructPix2Pix model. At the same time, in order to protect the non-editing area and achieve more precise editing, the team proposed a post-edit strategy.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenesPicture

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenesPicture

Finally it can be seen that the team’s approach can work well to achieve world- instructed image editing.

Paper link:
https://www.php.cn/link/154d7da9e669c75ee317d46614381dd8
Code link:
https://www.php .cn/link/e6da32eef072f987685b6eddca072d4f

The above is the detailed content of Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
在线地图还能这样?MapTracker:用跟踪实现在线地图新SOTA!在线地图还能这样?MapTracker:用跟踪实现在线地图新SOTA!Apr 25, 2024 pm 05:01 PM

写在前面&笔者的个人理解该算法允许在线高精度地图构建。我们的方法MapTracker将传感器流累积到两种显示的内存缓冲区中:1)鸟瞰图(BEV)空间中的Rasterlatents和2)道路元素(即人行横道、车道线和道路边界)上的Vectorlatents。该方法借鉴了目标跟踪中的查询传播范式,该范式明确地将前一帧的跟踪道路元素与当前帧相关联,同时融合了与距离步幅的内存latents子集,以进开源链接:https://map-tracker.github.io/总结来说,本文的主要贡献如下:一种新

CMU进行详细比较研究,发现GPT-3.5比Gemini Pro更优,确保公平透明可重复性CMU进行详细比较研究,发现GPT-3.5比Gemini Pro更优,确保公平透明可重复性Dec 21, 2023 am 08:13 AM

谷歌Gemini的实力究竟如何?卡内基梅隆大学进行了一项专业客观的第三方比较为保证公平,所有模型使用相同的提示和生成参数,并且提供可重复的代码和完全透明的结果。不会像谷歌官方发布会那样,用CoT@32对比5-shot了。一句话结果:GeminiPro版本接近但略逊于GPT-3.5Turbo,GPT-4还是遥遥领先。在深入分析中还发现Gemini一些奇怪特性,比如选择题喜欢选D……有许多研究者表示,Gemini刚刚发布没几天就进行了非常详细的测试,这是非常了不起的成就六大任务深入测试这个测试具体比

MIT最新力作:用GPT-3.5解决时间序列异常检测问题MIT最新力作:用GPT-3.5解决时间序列异常检测问题Jun 08, 2024 pm 06:09 PM

今天给大家介绍一篇MIT上周发表的文章,使用GPT-3.5-turbo解决时间序列异常检测问题,初步验证了LLM在时间序列异常检测中的有效性。整个过程没有进行finetune,直接使用GPT-3.5-turbo进行异常检测,文中的核心是如何将时间序列转换成GPT-3.5-turbo可识别的输入,以及如何设计prompt或者pipeline让LLM解决异常检测任务。下面给大家详细介绍一下这篇工作。图片论文标题:Largelanguagemodelscanbezero-shotanomalydete

一文看尽SOTA生成式模型:九大类别21个模型全回顾!一文看尽SOTA生成式模型:九大类别21个模型全回顾!May 02, 2023 pm 03:43 PM

过去的两年时间里,AI界的大型生成模型发布呈井喷之势,尤其是StableDiffusion开源和ChatGPT开放接口后,更加激发了业界对生成式模型的热情。但生成式模型种类繁多,发布速度也非常快,稍不留神就有可能错过了sota最近,来自西班牙科米利亚斯主教大学的研究人员全面回顾了各个领域内AI的最新进展,将生成式模型按照任务模态、领域分为了九大类,并总结了2022年发布的21个生成式模型,一次看明白生成式模型的发展脉络!论文链接:https://arxiv.org/abs/2301.04655生

OpenAI 已全面开放 GPT-3.5 Turbo、DALL-E 及 Whisper APIOpenAI 已全面开放 GPT-3.5 Turbo、DALL-E 及 Whisper APIJul 15, 2023 am 10:57 AM

7月10日消息,OpenAI昨日宣布全面开放GPT-3.5Turbo、DALL-E及WhisperAPI,以辅助开发者改善模型处理效率,此外,OpenAI同时表示正在开发GPT-4及GPT-3.5Turbo的后续功能,这些功能计划于今年下半年推出。OpenAI透露,当前所有API调用的AI模型,都已默认升级到GPT-4,现有用户无需切换即可使用。注:WhisperAPI是一款语音转文本的AI模型,可以识别用户的语音,视频等媒体并转为文本。▲图源OpenAI官网此外,OpenAI表示正持续改进Ch

Claude 3反超GPT-4竞技场登顶!小杯Haiku成开发者新宠:性价比无敌Claude 3反超GPT-4竞技场登顶!小杯Haiku成开发者新宠:性价比无敌Mar 28, 2024 pm 02:58 PM

GPT-4真的被反超了!大模型竞技场上,Claude3大杯Opus新王登基,Elo分数来到榜首。连小杯Haiku也跻身第二梯队,超过了GPT-4-0613这个型号,把GPT-3.5-turbo远远甩在身后。Haiku的输入token价格,可是比GPT-3.5-turbo还便宜了一半,输出方面,每100万token也比GPT-3.5-turbo便宜近2块钱。跟GPT-4相比,价格更是只有1/20。并且Haiku同样支持200k上下文。难怪有开发者直言:GPT-3.5在ClaudeHaiku面前不堪

AI能证明数学数据库中82%的问题了,新SOTA已达成,还是基于TransformerAI能证明数学数据库中82%的问题了,新SOTA已达成,还是基于TransformerApr 10, 2023 am 08:51 AM

不得不说,科学家们最近都在痴迷给AI补数学课了。这不,脸书团队也来凑热闹,提出了一种新模型,能完全自动化论证定理,并显著优于SOTA。要知道,随着数学定理愈加复杂,之后再仅凭人力来论证定理只会变得更加困难。因此,用计算机论证数学定理已经成为一个研究焦点。此前OpenAI也提出过专攻这一方向的模型GPT-f,它能论证Metamath中56%的问题。而这次提出的最新方法,能将这一数字提升到82.6%。与此同时,研究人员表示该方法使用的时间还更短,与GPT-f相比可以将计算消耗缩减到原本的十分之一。难

浙大提出新SOTA技术SIFU:仅需一张图片即可重建高质量3D人体模型浙大提出新SOTA技术SIFU:仅需一张图片即可重建高质量3D人体模型Jan 18, 2024 pm 02:15 PM

在AR、VR、3D打印、场景搭建以及电影制作等多个领域中,高质量的穿着衣服的人体3D模型非常重要。传统方法创建模型需大量时间,专业设备和技术人员才可完成。相反,在日常生活中,我们通常使用手机相机或在网页上找到的人像照片。因此,一种能从单张图像准确重建3D人体模型的方法可以显著降低成本,并简化独立创作的过程。以往方法(左)与本文方法技术路线比较(右)以往的深度学习模型用于3D人体重建,往往需要经过三个步骤:从图像中提取2D特征,将2D特征转到3D空间,以及3D特征用于人体重建。然而这些方法在2D特

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor