


'Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced'
In common image editing operations, image synthesis refers to the process of combining the foreground object of one picture with another background picture to generate a composite picture. The visual effect of the synthesized image is similar to transferring foreground objects from one picture to another background picture, as shown in the figure below

Image synthesis in artistic creation , poster design, e-commerce, virtual reality, data augmentation and other fields are widely used
The composite image obtained by simple cut and paste may have many problems. In previous research work, image synthesis derived different subtasks to solve different subproblems respectively. Image blending, for example, aims to resolve unnatural borders between foreground and background. Image harmonization aims to adjust the lighting of the foreground so that it harmonizes with the background. Perspective adjustment aims to adjust the pose of the foreground so that it matches the background. Object placement aims to predict the appropriate location, size, and perspective angle for foreground objects. Shadow generation aims to generate reasonable shadows for foreground objects on the background
As shown in the figure below, previous research work performed the above subtasks in a serial or parallel manner to obtain realistic and natural synthetic images. In the serial framework, we can selectively execute some subtasks according to actual needs
In the parallel framework, the currently popular method is to use the diffusion model. It accepts a background image with a foreground bounding box and a foreground object image as input and directly generates the final composite image. This can make foreground objects and background images seamlessly blended, lighting and shadow effects are reasonable, and postures are adapted to the background.
This parallel framework is equivalent to executing multiple subtasks at the same time, and cannot selectively execute some subtasks. It is not controllable and may bring unnecessary or unreasonable changes to the posture or color of foreground objects
What needs to be rewritten is:

In order to enhance the controllability of the parallel framework and selectively perform some sub-tasks, we proposed the controllable image composition model Controlable Image Composition (ControlCom). As shown in the figure below, we use an indicator vector as the condition information of the diffusion model to control the properties of the foreground objects in the composite image. The indication vector is a two-dimensional binary vector, in which each dimension controls whether to adjust the lighting attributes and posture attributes of the foreground object respectively, where 1 means adjustment and 0 means retention. Specifically, (0,0 ) means that it neither changes the foreground illumination nor the foreground posture, but just seamlessly blends the object into the background image, which is equivalent to image blending. (1,0) means only changing the foreground lighting to make it harmonious with the background and retaining the foreground posture, which is equivalent to image harmonization. (0,1) means only changing the foreground pose to match the background and retaining the foreground illumination, which is equivalent to view synthesis. (1,1) means changing the illumination and posture of the foreground at the same time, which is equivalent to the current uncontrollable parallel image synthesis
We incorporate four tasks into the same framework and implement a four-in-one object portal through indicator vectors function that can transport objects to specified locations in the scene. This work is a collaboration between Shanghai Jiao Tong University and Ant Group. The code and model will be open source soon

Code model link: https://github.com/bcmi/ControlCom-Image-Composition
In the figure below, we show the function of controllable image composition

In the column on the right, the lighting of the foreground object is supposed to be the same as the background lighting. Previous methods may cause unexpected changes in the color of foreground objects, such as vehicles and clothing. Our method (version 0.1) is able to preserve the color of a foreground object while simultaneously adjusting its pose so that it blends naturally into the background image

Next, we show more results for four versions of our method (0,0), (1,0), (0,1), (1,1). It can be seen that when using different indicator vectors, our method can selectively adjust some attributes of foreground objects, effectively control the effect of the composite image, and meet the different needs of users.

What we need to rewrite is: What is the model structure that can realize the four functions? Our method adopts the following model structure. The input of the model includes background images with foreground bounding boxes and foreground object images. The features and indicator vectors of the foreground objects are combined into the diffusion model
We re-extract the foreground Global features and local features of the object, and first fuse global features and then local features. During the local fusion process, we use aligned foreground feature maps for feature modulation to achieve better detail preservation. At the same time, indicator vectors are used in both global fusion and local fusion to more fully control the properties of foreground objects
We use the pre-trained stable diffusion algorithm to train the model based on 1.9 million images from OpenImage. In order to train four subtasks simultaneously, we designed a set of data processing and enhancement processes. For details on the data and training, see the paper

We tested on the COCOEE dataset and a dataset we built ourselves. Since previous methods can only achieve uncontrollable image synthesis, we compared with the (1,1) version and previous methods. The comparison results are shown in the figure below. PCTNet is an image harmonization method that can preserve the details of objects, but cannot adjust the posture of the foreground, nor can it complete the foreground objects. Other methods can generate the same kind of objects, but are less effective at retaining details, such as the style of clothes, the texture of cups, the color of bird feathers, etc.
Our method is better in comparison. Preserve the details of foreground objects, complete incomplete foreground objects, and adjust the lighting, posture and adaptation of foreground objects to the background
This work is for controllable This is the first attempt at image synthesis. The task is very difficult and there are still many shortcomings. The performance of the model is not stable and robust enough. In addition, in addition to lighting and pose, the attributes of foreground objects can be further refined. How to achieve finer-grained controllable image synthesis is a more challenging task
In order to keep the original intention Changes, the content that needs to be rewritten is: Reference
Yang, Gu, Zhang, Zhang, Chen, Sun, Chen, Wen (2023). Example-based image editing and diffusion models. In CVPR
[2] Song Yongzhong, Zhang Zhi, Lin Zhilong, Cohen, S. D., Price, B. L., Zhang Jing, Jin Suying, Arriaga, D. G. 2023. ObjectStitch: Generative object synthesis. In CVPR
The above is the detailed content of 'Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced'. For more information, please follow other related articles on the PHP Chinese website!

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Dreamweaver CS6
Visual web development tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.