Home  >  Article  >  Technology peripherals  >  Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

PHPz
PHPzforward
2023-12-15 13:54:33476browse

In recent years, important progress has been made in automatically converting text into 3D content, driven by the development of pre-trained diffusion models [1, 2, 3]. Among them, DreamFusion[4] introduces an effective method that utilizes a pre-trained 2D diffusion model[5] to automatically generate 3D assets from text without the need for a dedicated 3D asset dataset

One of the key innovations introduced in DreamFusion is the Fractional Distillation Sampling (SDS) algorithm. The algorithm evaluates a single 3D representation using a pretrained 2D diffusion model, such as NeRF [6], optimizing it to ensure that the rendered image from any camera perspective maintains a high consistency with the given text. Inspired by the seminal SDS algorithm, several works [7, 8, 9, 10, 11] have emerged to advance text-to-3D generation tasks by applying pre-trained 2D diffusion models.

Although text-to-3D generation has made significant progress by leveraging pre-trained text-to-2D diffusion models, there is still a large gap between 2D images and 3D assets. big field gap. This distinction is clearly demonstrated in Figure 1.

First, text-to-2D models produce camera-agnostic generation results that focus on generating high-quality images from specific angles while ignoring other angles. In contrast, 3D content creation is intricately tied to camera parameters such as position, shooting angle, and field of view. Therefore, text-to-3D models must produce high-quality results over all possible camera parameters.

In addition, text-to-2D generative models need to generate foreground and background elements simultaneously to maintain the overall coherence of the image. In contrast, text-to-3D generative models only need to focus on creating foreground objects. This difference enables text-to-3D models to allocate more resources and attention to accurately represent and generate foreground objects. Therefore, when using pre-trained 2D diffusion models directly for 3D asset creation, the domain difference between text-to-2D and text-to-3D generation becomes a significant performance barrier

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Figure 1 The output of text-to-2D generative model (left) and text-to-3D generative model (right) under the same text prompt, namely "A statue of Leonardo DiCaprio's head ."

To solve this problem, the paper proposes X-Dreamer, a novel method for high-quality text-to-3D content creation that can effectively bridge text-to- -Domain gap between 2D and text-to-3D generation.

The key components of X-Dreamer are two innovative designs: Camera-Guided Low-Rank Adaptation (CG-LoRA) and Attention-Mask Alignment (AMA) loss.

First of all, existing methods [7, 8, 9, 10] usually use 2D pre-trained diffusion models [5, 12] for text-to-3D generation, which lacks the ability to work with cameras. inherent relationship between parameters. To address this limitation and ensure that X-Dreamer produces results that are directly affected by camera parameters, the paper introduces CG-LoRA to adjust the pre-trained 2D diffusion model. Notably, the parameters of CG-LoRA are dynamically generated based on camera information during each iteration, thereby establishing a robust relationship between the text-to-3D model and camera parameters.

Secondly, the pre-trained text-to-2D diffusion model allocates attention to foreground and background generation, while the creation of 3D assets requires more attention to the accurate generation of foreground objects. To address this issue, the paper proposes the AMA loss, which uses a binary mask of 3D objects to guide the attention map of a pre-trained diffusion model to prioritize the creation of foreground objects. By incorporating this module, X-Dreamer prioritizes the generation of foreground objects, significantly improving the overall quality of the generated 3D content.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Project homepage:

https://xmu-xiaoma666.github.io/Projects/ X-Dreamer/

Github homepage: https://github.com/xmu-xiaoma666/X-Dreamer

Discussion Address: https://arxiv.org/abs/2312.00085

X-Dreamer has made the following contributions to the field of text-to-3D generation contribute:

  • The paper proposes a novel method, X-Dreamer, for high-quality text-to-3D content creation, effectively bridging text-to-2D and text-to-3D The main gap between the builds.
  • In order to enhance the alignment between the generated results and the camera perspective, the paper proposes CG-LoRA, which uses camera information to dynamically generate specific parameters of the 2D diffusion model.
  • In order to prioritize the creation of foreground objects in text-to-3D models, the paper introduces the AMA loss, which uses a binary mask of foreground 3D objects to guide the attention map of the 2D diffusion model.

Method

X-Dreamer consists of two main stages: geometry learning and appearance study. For geometry learning, this study uses DMTET as the 3D representation and utilizes a 3D ellipsoid to initialize it. When initialized, the loss function uses mean square error (MSE) loss. Next, DMTET and CG-LoRA are optimized using Fractional Distillation Sampling (SDS) loss and the AMA loss proposed in this study to ensure alignment between the 3D representation and the input text prompt

For appearance learning, the paper uses bidirectional reflection distribution function (BRDF) modeling. Specifically, the paper utilizes MLP with trainable parameters to predict surface materials. Similar to the geometry learning stage, the paper uses SDS loss and AMA loss to optimize the trainable parameters of MLP and CG-LoRA to achieve alignment between 3D representations and text cues. Figure 2 shows the detailed composition of X-Dreamer.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Figure 2 Overview of X-Dreamer, including geometry learning and appearance learning.

Geometry Learning (Geometry Learning)

In this module, X-Dreamer The DMTET is parameterized into a 3D representation using the MLP networkBreaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. In order to enhance the stability of geometric modeling, this article uses 3D ellipsoid as the initial configuration of DMTET Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. For each vertex Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. belonging to the tetrahedral mesh , this paper trains Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. to predict two important quantities: SDF value Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. and deformation bias Shift amount Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. In order to initialize Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. to an ellipsoid, this article samples N points evenly distributed within the ellipsoid, and calculates the corresponding SDF value Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. Subsequently, the mean square error (MSE) loss is used to optimize Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. This optimization process ensures that Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. efficiently initializes the DMTET so that it resembles a 3D ellipsoid. The formula for MSE loss is as follows:

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

After initializing the geometry, align the DMTET's geometry with the input text prompt. This is done by using differential rendering techniques to generate a normal map n and an object's mask m from an initialized DMTET given a randomly sampled camera pose c. Subsequently, the normal map n is input into a frozen Stable Diffusion model (SD) with a trainable CG-LoRA embedding, and the parameters in Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. are updated using the SDS loss, defined as follows: Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Among them, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents the parameters of SD, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is the value of SD at a given noise level t and text embedding y The prediction noise of the SD of the case. Additionally, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., where Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents noise sampled from a normal distribution. The implementation of Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. and Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is based on DreamFusion [4].

Additionally, in order to focus SD on generating foreground objects, X-Dreamer introduces an additional AMA loss to align the object mask with SD’s attention map as follows:


Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

## represents the number of attention layers, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is the attention map of the i-th attention layer. Function Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is used to adjust the size of the rendered 3D object mask to ensure that its size is aligned with the size of the attention map. Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Appearance Learning (Appearance Learning)

After obtaining the geometry of a 3D object, the goal of this article is to calculate the appearance of the 3D object using a Physically Based Rendering (PBR) material model. The material model includes diffusion terms Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., roughness and metal terms Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., and normal change terms Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. For any point Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

on the surface of the geometry, the multilayer perceptron (MLP) parameterized by Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is used to obtain three material terms, which can be expressed as follows:

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Among them, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents position encoding using hash grid technology. After that, each pixel of the rendered image can be calculated using the following formula:

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

#

Among them, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents the pixel value of the point on the surface of the 3D object rendered from the direction Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents the hemisphere defined by the set of incident directions that satisfies the condition Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., where Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents the incident direction, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. Represents the surface normal at point Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. corresponds to the incident light from the ready-made environment map, and Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is the bidirectional reflection distribution function related to the material properties (i.e. Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.) (BRDF). By aggregating all rendered pixel colors, the rendered image Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. can be obtained. Similar to the geometry learning stage, the rendered image Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is fed into SD and optimized Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. using SDS loss and AMA loss.


Camera-Guided Low-Rank Adaptation (CG-LoRA)

In order to solve the problem of generating sub-optimal 3D results caused by the domain gap between generating text into 2D and 3D, X-Dreamer proposed a low-rank adaptation method based on camera guidance

such as As shown in Figure 3, camera parameters and direction-aware text are used to guide the generation of parameters in CG-LoRA, so that X-Dreamer can effectively perceive the position and direction information of the camera.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Figure 3 Illustration of camera-guided CG-LoRA.

Specifically, given the text prompt Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. and camera parameters Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., first use the pre-trained text CLIP encoder Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. and trainable The MLPBreaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., projects these inputs into the feature space:

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

where, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. and Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. are text features and camera features respectively. After that, use two low-rank matrices to project Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. and Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. into a trainable dimensionality reduction matrix in CG-LoRA:

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Among them, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. and Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. are the two dimensionality reduction matrices of CG-LoRA. Function Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

is used to transform the shape of a tensor from Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. to Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. and Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. are two low-rank matrices. Therefore, they can be decomposed into the product of two matrices to reduce the trainable parameters in the implementation, i.e. Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.; Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., where Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. , Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation., Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.,Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is a small number ( Such as: 4). According to the composition of LoRA, the dimension expansion matrix Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is initialized to zero to ensure that the model starts training with SD's pre-trained parameters. Therefore, the feedforward process formula of CG-LoRA is as follows:

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

where, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents the frozen parameters of the pre-trained SD model, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. is a cascade operation. In the implementation of this method, CG-LoRA is integrated into the linear embedding layer of the attention module in SD to effectively capture orientation and camera information.

What needs to be re-expressed is: attention mask alignment loss (AMA loss)

SD is pre- Trained to generate 2D images taking into account both foreground and background elements. However, text-to-3D generation requires more attention to the generation of foreground objects. Given this requirement, X-Dreamer proposes the Attention-Mask Alignment Loss (AMA loss) to align the attention map of SD with the rendered mask image of the 3D object. Specifically, for each attention layer in pre-trained SD, this method uses query image features Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. and key CLS label features Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. to calculate the attention map. Calculated as follows:

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Among them, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents the number of heads in the multi-head attention mechanism, Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents the attention map, and then, through all attention heads The attention values ​​of the attention map Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. are averaged to calculate the value of the overall attention map Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation..

Since the softmax function is used to normalize the attention map values, the activation values ​​in the attention map may become very small when the image feature resolution is high. However, directly aligning the attention map with the mask of the rendered 3D object is not optimal, considering that each element in the rendered 3D object mask is a binary value of 0 or 1. To solve this problem, the paper proposes a normalization technique that maps the values ​​in the attention map to between (0, 1). The formula for this normalization process is as follows:

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

where Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation. represents a small constant value (such as Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.) to prevent 0 from appearing in the denominator. Finally, the AMA loss is used to align the attention maps of all attention layers to the rendered mask of the 3D object.

Experimental results

The paper uses four Nvidia RTX 3090 GPUs and the PyTorch library to conduct experiments. To calculate the SDS loss, the Stable Diffusion model implemented via Hugging Face Diffusers was utilized. For DMTET and material encoders, they are implemented as two-layer MLP and single-layer MLP respectively, with a hidden layer dimension of 32.

Generate text-to-3D starting from ellipsoid

Paper presentation The text-to-3D generation result of X-Dreamer using ellipsoid as the initial geometric shape is shown in Figure 4. The results demonstrate X-Dreamer's ability to generate high-quality and photorealistic 3D objects that accurately correspond to the input text prompts.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Figure 4 uses the ellipsoid as the starting point to generate text-to-3D

Start text-to-3D generation from coarse-grained meshes

Although a large number of coarse-grained meshes can be downloaded from the Internet, Using these meshes directly to create 3D content often results in poor performance due to the lack of geometric detail. However, these meshes can provide X-Dreamer with better 3D shape prior information than 3D ellipsoids.

Therefore, it is also possible to use a coarse-grained guide grid to initialize DMTET instead of using an ellipsoid. As shown in Figure 5, X-Dreamer can generate 3D assets with precise geometric detail based on given text, even if the provided coarse-grained mesh lacks detail.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Figure 5 Text-to-3D generation starting from a coarse-grained mesh.

The content that needs to be rewritten is: Qualitative Comparison.

In order to evaluate the effectiveness of X-Dreamer, this The paper compares it with four advanced methods: DreamFusion [4], Magic3D [8], Fantasia3D [7] and ProlificDreamer [11], as shown in Figure 6

When compared to SDS-based methods [4, 7, 8], X-Dreamer outperforms them in generating high-quality and realistic 3D assets. Furthermore, X-Dreamer produces 3D content with comparable or even better visual effects compared to VSD-based methods [11] while requiring significantly less optimization time. Specifically, the geometry and appearance learning process only takes about 27 minutes for X-Dreamer, compared to more than 8 hours for ProlificDreamer.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Figure 6 Comparison with state-of-the-art (SOTA) methods.

The content that needs to be rewritten is: ablation experiment

  • Module ablation

In order to gain a deeper understanding of the capabilities of CG-LoRA and AMA loss, the paper conducted an ablation study in which each module was added individually to evaluate its impact. As shown in Figure 7, the ablation results show that when CG-LoRA is excluded from X-Dreamer, the geometry and appearance quality of the generated 3D objects decrease significantly.

Additionally, X-Dreamer’s missing AMA loss also has a deleterious effect on the geometry and appearance fidelity of the resulting 3D assets. These need to be re-written: The ablation experiments provide valuable investigation into the individual contributions of CG-LoRA and AMA losses in enhancing the geometry, appearance and overall quality of the generated 3D objects.

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

Figure 7 Ablation study of X-Dreamer.

  • Comparison of attention maps with and without AMA loss

The purpose of introducing AMA loss is to reduce the noise in the denoising process. Attention is focused on foreground objects. This is achieved by aligning the attention map of the SD with the rendering mask of the 3D object. In order to evaluate the effectiveness of AMA loss in achieving this goal, this paper compares the attention maps of SD with and without AMA loss in the geometry learning and appearance learning stages respectively.

According to Figure 8 It can be observed that adding AMA loss not only improves the geometry and appearance of the generated 3D assets, but also allows SD to focus its attention specifically on foreground object areas. The visualization results confirm the effectiveness of the AMA loss in guiding SD attention, thereby improving the quality of the geometry and appearance learning stages and the focusing of foreground objects

Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.

The content that needs to be rewritten is: Figure 8 shows the visualization results of the attention map, rendering mask and rendered image, including and excluding the AMA loss

This research introduces A groundbreaking framework called X-Dreamer aims to enhance text-to-3D generation by addressing the domain gap between text-to-2D and text-to-3D generation. To achieve this goal, the paper first proposes CG-LoRA, a module that incorporates three-dimensional relevant information (including direction-aware text and camera parameters) into a pre-trained Stable Diffusion (SD) model. By doing so, this paper is able to effectively capture information related to the three-dimensional domain. Furthermore, this paper designs an AMA loss to align the SD-generated attention map with the rendering mask of the 3D object. The main goal of AMA loss is to guide the focus of text to 3D models towards the generation of foreground objects. Through extensive experiments, this paper comprehensively evaluates the effectiveness of the proposed method and demonstrates that X-Dreamer is able to generate high-quality and realistic 3D content based on given text prompts

The above is the detailed content of Breaking through the dimensional wall, X-Dreamer brings high-quality text to 3D generation, integrating the fields of 2D and 3D generation.. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete