'Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced'-AI-php.cn

Home

Technology peripherals

'Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced'

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Sep 12, 2023 pm 04:05 PM

theoryimage synthesisimage editing

In common image editing operations, image synthesis refers to the process of combining the foreground object of one picture with another background picture to generate a composite picture. The visual effect of the synthesized image is similar to transferring foreground objects from one picture to another background picture, as shown in the figure below

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced

Image synthesis in artistic creation , poster design, e-commerce, virtual reality, data augmentation and other fields are widely used

The composite image obtained by simple cut and paste may have many problems. In previous research work, image synthesis derived different subtasks to solve different subproblems respectively. Image blending, for example, aims to resolve unnatural borders between foreground and background. Image harmonization aims to adjust the lighting of the foreground so that it harmonizes with the background. Perspective adjustment aims to adjust the pose of the foreground so that it matches the background. Object placement aims to predict the appropriate location, size, and perspective angle for foreground objects. Shadow generation aims to generate reasonable shadows for foreground objects on the background

As shown in the figure below, previous research work performed the above subtasks in a serial or parallel manner to obtain realistic and natural synthetic images. In the serial framework, we can selectively execute some subtasks according to actual needs

In the parallel framework, the currently popular method is to use the diffusion model. It accepts a background image with a foreground bounding box and a foreground object image as input and directly generates the final composite image. This can make foreground objects and background images seamlessly blended, lighting and shadow effects are reasonable, and postures are adapted to the background.

This parallel framework is equivalent to executing multiple subtasks at the same time, and cannot selectively execute some subtasks. It is not controllable and may bring unnecessary or unreasonable changes to the posture or color of foreground objects

What needs to be rewritten is:

In order to enhance the controllability of the parallel framework and selectively perform some sub-tasks, we proposed the controllable image composition model Controlable Image Composition (ControlCom). As shown in the figure below, we use an indicator vector as the condition information of the diffusion model to control the properties of the foreground objects in the composite image. The indication vector is a two-dimensional binary vector, in which each dimension controls whether to adjust the lighting attributes and posture attributes of the foreground object respectively, where 1 means adjustment and 0 means retention. Specifically, (0,0 ) means that it neither changes the foreground illumination nor the foreground posture, but just seamlessly blends the object into the background image, which is equivalent to image blending. (1,0) means only changing the foreground lighting to make it harmonious with the background and retaining the foreground posture, which is equivalent to image harmonization. (0,1) means only changing the foreground pose to match the background and retaining the foreground illumination, which is equivalent to view synthesis. (1,1) means changing the illumination and posture of the foreground at the same time, which is equivalent to the current uncontrollable parallel image synthesis

We incorporate four tasks into the same framework and implement a four-in-one object portal through indicator vectors function that can transport objects to specified locations in the scene. This work is a collaboration between Shanghai Jiao Tong University and Ant Group. The code and model will be open source soon

Please click the following link to view the paper: https://arxiv.org/ abs/2308.10040

Code model link: https://github.com/bcmi/ControlCom-Image-Composition

In the figure below, we show the function of controllable image composition

In the left column, the posture of the foreground object is originally adapted to the background image. The user may want to retain the posture of the foreground object. Previous methods PbE [1] and ObjectStitch [2] will make unnecessary and uncontrollable changes to the pose of foreground objects. The (1,0) version of our method is able to preserve the pose of the foreground object, blending the foreground object seamlessly into the background image with harmonious lighting

In the column on the right, the lighting of the foreground object is supposed to be the same as the background lighting. Previous methods may cause unexpected changes in the color of foreground objects, such as vehicles and clothing. Our method (version 0.1) is able to preserve the color of a foreground object while simultaneously adjusting its pose so that it blends naturally into the background image

Next, we show more results for four versions of our method (0,0), (1,0), (0,1), (1,1). It can be seen that when using different indicator vectors, our method can selectively adjust some attributes of foreground objects, effectively control the effect of the composite image, and meet the different needs of users.

What we need to rewrite is: What is the model structure that can realize the four functions? Our method adopts the following model structure. The input of the model includes background images with foreground bounding boxes and foreground object images. The features and indicator vectors of the foreground objects are combined into the diffusion model

We re-extract the foreground Global features and local features of the object, and first fuse global features and then local features. During the local fusion process, we use aligned foreground feature maps for feature modulation to achieve better detail preservation. At the same time, indicator vectors are used in both global fusion and local fusion to more fully control the properties of foreground objects

We use the pre-trained stable diffusion algorithm to train the model based on 1.9 million images from OpenImage. In order to train four subtasks simultaneously, we designed a set of data processing and enhancement processes. For details on the data and training, see the paper

We tested on the COCOEE dataset and a dataset we built ourselves. Since previous methods can only achieve uncontrollable image synthesis, we compared with the (1,1) version and previous methods. The comparison results are shown in the figure below. PCTNet is an image harmonization method that can preserve the details of objects, but cannot adjust the posture of the foreground, nor can it complete the foreground objects. Other methods can generate the same kind of objects, but are less effective at retaining details, such as the style of clothes, the texture of cups, the color of bird feathers, etc.

Our method is better in comparison. Preserve the details of foreground objects, complete incomplete foreground objects, and adjust the lighting, posture and adaptation of foreground objects to the background

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced

This work is for controllable This is the first attempt at image synthesis. The task is very difficult and there are still many shortcomings. The performance of the model is not stable and robust enough. In addition, in addition to lighting and pose, the attributes of foreground objects can be further refined. How to achieve finer-grained controllable image synthesis is a more challenging task

In order to keep the original intention Changes, the content that needs to be rewritten is: Reference

Yang, Gu, Zhang, Zhang, Chen, Sun, Chen, Wen (2023). Example-based image editing and diffusion models. In CVPR

[2] Song Yongzhong, Zhang Zhi, Lin Zhilong, Cohen, S. D., Price, B. L., Zhang Jing, Jin Suying, Arriaga, D. G. 2023. ObjectStitch: Generative object synthesis. In CVPR

The above is the detailed content of 'Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced'. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete

Are You At Risk Of AI Agency Decay? Take The Test To Find OutApr 21, 2025 am 11:31 AM

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

How to Build an AI Agent from Scratch? - Analytics VidhyaApr 21, 2025 am 11:30 AM

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

Revisiting The Humanities In The Age Of AIApr 21, 2025 am 11:28 AM

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

Understanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

Insights on spaCy, Prodigy and Generative AI from Ines MontaniApr 21, 2025 am 11:01 AM

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

A Guide to Building Agentic RAG Systems with LangGraphApr 21, 2025 am 11:00 AM

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.