rare! Apple's open-source image editing tool MGIE, is it going to be available on the iPhone?-AI-php.cn

Home

Technology peripherals

rare! Apple's open-source image editing tool MGIE, is it going to be available on the iPhone?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Feb 05, 2024 pm 03:33 PM

iphoneaitrain

Take a photo, enter a text command, and the phone will start automatically retouching the photo?

This magical feature comes from Apple’s newly open-sourced image editing tool “MGIE”.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Remove people in the background

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

In Add pizza to the table

Recently, AI has made significant progress in image editing. On the one hand, through multi-modal large models (MLLM), AI can take images as input and provide visual perception responses, thereby achieving more natural picture editing. On the other hand, instruction-based editing technology makes the editing process no longer rely on detailed descriptions or area masks, but allows users to directly issue instructions to express editing methods and goals. This method is very practical because it is more in line with the intuitive way of humans. Through these innovative technologies, AI is gradually becoming people's right-hand assistant in the field of picture editing.

Based on the inspiration of the above technology, Apple proposed MGIE (MLLM-Guided Image Editing), using MLLM to solve the problem of insufficient instruction guidance.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Paper title: Guiding Instruction-based Image Editing via Multimodal Large Language Models
Paper link: https://openreview.net/pdf?id=S1RKWSyZ2Y
Project homepage: https://mllm-ie.github.io/

MGIE (Mind-Guided Image Editing) consists of MLLM (Mind-Language Linking Model) and diffusion model, as shown in Figure 2. MLLM learns to acquire concise expression instructions and provides clear, visually relevant guidance. The diffusion model performs image editing using the latent imagination of the intended target and is updated synchronously through end-to-end training. In this way, MGIE is able to benefit from inherent visual derivation and resolve ambiguous human instructions to achieve sensible editing.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Guided by human commands, MGIE can perform Photoshop-style modifications, global photo optimization, and local object modifications. Take the picture below as an example. It is difficult to capture the meaning of "healthy" without additional context, but MGIE can accurately associate "vegetable toppings" with pizza and edit it accordingly to human expectations.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

##This reminds us of the "ambition" Cook expressed on the earnings call not long ago: "I I think there is a huge opportunity for Apple in generative AI, but I don’t want to go into more details.” The information he revealed included that Apple is actively developing generative AI software features, and these features will be available to Apple later in 2024. Customer provided.

Combined with a series of generative AI theoretical research results released by Apple in recent times, it seems that we are looking forward to the new AI functions that Apple will release next.

Paper details

The MGIE method proposed in this study can edit the input image V into the target image through the given instruction X rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? . For those imprecise instructions, MLLM in MGIE will perform learning derivation to obtain concise expression instructions ε. In order to build a bridge between language and visual modalities, the researchers also added a special token [IMG] after ε and used the edit head rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? to convert them. The transformed information will serve as the underlying visual imagination in MLLM, guiding the diffusion model to achieve the desired editing goals. MGIE is then able to understand visually aware fuzzy commands to perform reasonable image editing (the architecture diagram is shown in Figure 2 above).

Concise expression of instructions

Through feature alignment and instruction adjustment, MLLM can provide cross-modal perception and vision Relevant responses. For image editing, the study uses the prompt "what will this image be like if [instruction]" as the language input for the image and derives detailed explanations of the editing commands. However, these explanations are often too lengthy and even mislead the user’s intent. To obtain a more concise description, this study applies a pretrained summarizer to let MLLM learn to generate summary output. This process can be summarized as follows:

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Image editing through potential imagination

The study uses editorial heads rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? to transform [IMG] into actual visual guidance. where is a sequence-to-sequence model that maps continuous visual tokens from MLLM to semantically meaningful latent U = {u_1, u_2, ..., u_L} and serves as an editing guide ：

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

In order to realize the process of guiding image editing through visual imagination, this study considers using the diffusion model rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? , this model can also solve the denoising diffusion problem in the latent space while including a variational autoencoder (VAE).

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Algorithm 1 shows the MGIE learning process. MLLM derives compact instructions ε via instruction losses L_ins. Leveraging the underlying imagination of [IMG] rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? transforms its modalities and guides the synthesis of the resulting image. The edit loss L_edit is used for diffusion training. Since most weights can be frozen (self-attention blocks within MLLM), parameter-efficient end-to-end training is achieved.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Experimental evaluation

For input images, under the same instructions, the difference between different methods Compare, for example, the first line of instructions is "turn day into night":

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Table 1 shows the zero-shot editing results of the model trained only on the dataset IPr2Pr. For EVR and GIER involving Photoshop-style modifications, the editing results were closer to the bootstrapping intent (e.g., LGIE achieved a higher CVS of 82.0 on EVR). For global image optimization on MA5k, InsPix2Pix is intractable due to the scarcity of relevant training triples. LGIE and MGIE can provide detailed explanations through the learning of LLM, but LGIE is still limited to its single modality. By accessing the image, MGIE can derive explicit instructions such as which areas should be brightened or which objects should be clearer, resulting in significant performance improvements (e.g., higher 66.3 SSIM and lower 0.3 photo distance), in Similar results were found on MagicBrush. MGIE also obtains the best performance from precise visual imagery and modification of specified targets as targets (e.g., higher 82.2 DINO visual similarity and higher 30.4 CTS global subtitle alignment).

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

#To study instruction-based image editing for specific purposes, Table 2 fine-tunes the model on each dataset. For EVR and GIER, all models improved when adapted to Photoshop-style editing tasks. MGIE consistently outperforms LGIE in every aspect of editing. This also illustrates that learning using expressive instructions can effectively enhance image editing, and that visual perception plays a crucial role in obtaining explicit guidance for maximal enhancement.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Trade-off between α_X and α_V. Image editing has two goals: manipulating the target as an instruction and retaining the remainder of the input image. Figure 3 shows the trade-off curve between instruction (α_X) and input consistency (α_V). This study fixed α_X at 7.5 and α_V varied in the range [1.0, 2.2]. The larger α_V is, the more similar the editing result is to the input, but the less consistent it is with the instruction. The X-axis calculates the CLIP directional similarity, that is, how consistent the editing results are with the instructions; the Y-axis is the feature similarity between the CLIP visual encoder and the input image. With specific expression instructions, the experiments outperform InsPix2Pix in all settings. In addition, MGIE can learn through explicit visual guidance, allowing for overall improvement. This supports robust improvements whether requiring greater input or editing relevance.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Ablation research

Besides ,The researchers also conducted ablation experiments, ,considering the performance of different architectures FZ, FT, and ,E2E in expressing instructions. The results show that MGIE consistently exceeds LGIE in FZ, FT, and E2E. This suggests that expressive instructions with critical visual perception have a consistent advantage across all ablation settings.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Why is MLLM bootstrapping useful? Figure 5 shows the CLIP-Score values between input or ground-truth target images and expression instructions. A higher CLIP-S score for the input image indicates that the instructions are relevant to the editing source, while better alignment with the target image provides clear, relevant editing guidance. As shown, MGIE is more consistent with the input/goal, which explains why its expressive instructions are helpful. With a clear narrative of expected results, MGIE can achieve the greatest improvements in image editing.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Human evaluation. In addition to automatic indicators, the researchers also performed manual evaluation. Figure 6 shows the quality of the generated expression instructions, and Figure 7 compares the image editing results of InsPix2Pix, LGIE, and MGIE in terms of instruction following, ground-truth relevance, and overall quality.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Inference efficiency. Although MGIE relies on MLLM to drive image editing, it only introduces concise expression instructions (less than 32 tokens), so the efficiency is comparable to InsPix2Pix. Table 4 lists the inference time costs on the NVIDIA A100 GPU. For a single input, MGIE can complete the editing task in 10 seconds. With more data parallelism, the time required is similar (37 seconds with a batch size of 8). The entire process can be completed with just one GPU (40GB).

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Qualitative comparison. Figure 8 shows a visual comparison of all used datasets, and Figure 9 further compares the expression instructions of LGIE or MGIE.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

##On the project homepage, the researcher also provides more demos (https://mllm- ie.github.io/). For more research details, please refer to the original paper.

The above is the detailed content of rare! Apple's open-source image editing tool MGIE, is it going to be available on the iPhone?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

A Comprehensive Guide to ExtrapolationApr 15, 2025 am 11:38 AM

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

The Rise Of Soft AI And What It Means For Businesses TodayApr 15, 2025 am 11:36 AM

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

Evolving Security Frameworks For The AI FrontierApr 15, 2025 am 11:34 AM

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!Apr 15, 2025 am 11:33 AM

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

New Short Course on Embedding Models by Andrew NgApr 15, 2025 am 11:32 AM

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Is Hallucination in Large Language Models (LLMs) Inevitable?Apr 15, 2025 am 11:31 AM

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

The 60% Problem — How AI Search Is Draining Your TrafficApr 15, 2025 am 11:28 AM

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

MIT Media Lab To Put Human Flourishing At The Heart Of AI R&DApr 15, 2025 am 11:26 AM

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

Chinese version, very easy to use

Atom editor mac version download

The most popular open source editor

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Zend Studio 13.0.1

Powerful PHP integrated development environment

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

Where is the login entrance for gmail email?

7517

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers