The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation-AI-php.cn

Home

Technology peripherals

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

PHPz

Jun 16, 2024 pm 10:50 PM

industry

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

The AIxiv column on this website is a column that publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com.

#In the research field of 3D generative modeling, the two current categories of 3D representation methods are either based on implicit decoders with insufficient fitting capabilities. Either it lacks a clearly defined spatial structure and is difficult to integrate with mainstream 3D diffusion technology. Researchers from the University of Science and Technology of China, Tsinghua University, and Microsoft Research Asia proposed GaussianCube, an explicitly structured 3D representation with powerful fitting capabilities, and can be seamlessly applied to current mainstream 3D diffusion models.

GaussianCube starts with a novel density-constrained Gaussian fitting algorithm that enables high-accuracy fitting of 3D assets while ensuring the use of a fixed number of free Gaussians. These Gaussians are then rearranged into a predefined voxel grid using an optimal transport algorithm. Thanks to the structural characteristics of GaussianCube, researchers can directly apply standard 3D U-Net as the backbone network for diffusion modeling without complex network design.

More importantly, the new fitting algorithm proposed in this paper greatly enhances the compactness of the representation, which is required when the 3D representation fitting quality is similar. The number of parameters is only one-tenth or one-hundredth of the number of parameters required for traditional structured representations. This compactness significantly reduces the complexity of 3D generative modeling. Researchers have conducted extensive experiments on unconditional and conditional 3D object generation, digital avatar creation, and text-to-3D content synthesis.

Numerical results show that GaussianCube achieves a performance improvement of up to 74% compared to the previous baseline algorithm. As shown below, GaussianCube is not only able to generate high-quality 3D assets, but also provides highly attractive visual effects, fully demonstrating its great potential as a universal representation for 3D generation.

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

# The method in this article can generate high-quality and diverse 3D models.

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation ## Figure 2. The result of digital avatar creation based on the input portrait. The method in this article can retain the identity feature information of the input portrait to a great extent, and provide detailed hairstyle and clothing modeling. The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

## 图 3. The results of creating three -dimensional assets based on the input text. The method in this article can output results consistent with text information, and can model complex geometric structures and detailed materials.

^{# 图 Figure 4. The result of the category condition. The 3D assets generated in this article have clear semantics and high-quality geometric structures and materials.}

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

Thesis name: GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling
Project homepage: https://gaussiancube.github.io/
Paper link: https://arxiv.org/pdf/2403.19655
Open source code: https://github.com/GaussianCube/ GaussianCube
Demo video: https://www.bilibili.com/video/BV1zy411h7wB/

What age are you in?

Are you still using traditional NeRF for 3D generative modeling?

Most previous 3D generative modeling efforts have used a variant of Neural Radiance Field (NeRF) as their underlying 3D Representations, which typically combine an explicit structured feature representation with an implicit feature decoder. However, in 3D generative modeling, all 3D objects have to share the same implicit feature decoder, which greatly weakens the fitting ability of NeRF. In addition, the volume rendering technology that NeRF relies on has very high computational complexity, which results in slow rendering speed and the consumption of extremely high GPU memory. Recently, another three-dimensional representation method, 3D Gaussian Splatting (3DGS), has attracted much attention. Although 3DGS has powerful fitting capabilities, efficient computing performance, and fully explicit features, it has been widely used in three-dimensional reconstruction tasks. However, 3DGS lacks a well-defined spatial structure, which makes it unable to be directly applied in current mainstream generative modeling frameworks.

Therefore, the research team proposed GaussianCube. This is an innovative three-dimensional representation method that is both structured and fully explicit, with powerful fitting capabilities. The method presented in this article first ensures a high-accuracy fit with a fixed number of free Gaussians, and then efficiently organizes these Gaussians into a structured voxel grid. This explicit and structured representation allows researchers to seamlessly adopt standard 3D network architectures, such as U-Net, without the need for complex and customized networks required when using unstructured or implicitly decoded representations. design.

At the same time, the structured organization through the optimal transmission algorithm maintains the spatial structure relationship between adjacent Gaussian kernels to the greatest extent, allowing researchers to only use classic 3D convolutional networks can extract features efficiently. More importantly, in view of the findings in previous studies that diffusion models perform poorly when dealing with high-dimensional data distributions, the GaussianCube proposed in this paper significantly reduces the amount of parameters required while maintaining high-quality reconstruction, greatly easing the problem. It eliminates the pressure of diffusion models on distribution modeling and brings significant modeling capabilities and efficiency improvements to the field of 3D generative modeling.

Method

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

#The framework of this article consists of two main stages: representation construction and three-dimensional diffusion. In the representation construction phase, given a multi-view rendering of a 3D asset, density-constrained Gaussian fitting is performed on it to obtain a 3D Gaussian with a fixed number. Subsequently, the three-dimensional Gaussian is structured into a GaussianCube through optimized transfer. In the 3D diffusion stage, the researchers trained the 3D diffusion model to generate GaussianCubes from Gaussian noise.

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

^{Humans need to create representations for each 3D asset that are suitable for generative modeling. Considering that the generative field often requires modeled data to have a uniform fixed length, the adaptive density control in the original 3DGS fitting algorithm will lead to different numbers of Gaussian kernels used to fit different objects, which brings problems to generative modeling. Great challenge. A very simple solution would be to simply remove the adaptive density control, but the researchers found that this severely reduced the accuracy of the fit. This paper proposes a novel density constraint fitting algorithm that retains the pruning operation in the original adaptive density control, but performs new constraint processing on the splitting and cloning operations.}

Specifically, assuming that the current iteration consists of

Gaussians, the researchers identify splits by selecting those Gaussians whose gradient magnitude exceeds a predefined threshold τ at their position in view space Or candidates for cloning operations, the number of these candidates is recorded as

. In order to prevent exceeding the predefined maximum

Gaussians,

Gaussians with the largest viewing angle spatial position gradient are selected from the candidates for splitting or cloning. After completing the fitting process, the researchers filled it with a Gaussian of α=0 to reach the target count without affecting the rendering results. Thanks to this strategy, a high-quality representation with several orders of magnitude fewer parameters compared to existing work of similar quality can be achieved, significantly reducing the modeling difficulty of diffusion models.

##儘管如此，透過上述擬合演算法得到的高斯仍然沒有明確的空間排列結構，這使得後續的擴散模型無法有效率地對資料進行建模。為此，研究人員提出將高斯映射到預先定義的結構化體素網格中來使得高斯具有明確的空間結構。直觀地說，這一步的目標是在盡可能保持高斯的空間相鄰關係的同時，將每個高斯 “移動” 到一個體素中。

研究人員將其建模為一個最優傳輸問題，使用Jonker-Volgenant 演算法來得到對應的映射關係，進而根據最優傳輸的解來組織將高斯組織到對應的體素中得到GaussianCube，並且用當前體素中心的偏移量取代了原始高斯的位置，以減少擴散模型的解空間。最終的 GaussianCube 表示不僅結構化，而且最大程度上保持了相鄰高斯之間的結構關係，這為 3D 生成建模的高效特徵提取提供了強有力的支持。

在三維擴散階段，本文使用三維擴散模型來建模 GaussianCube 的分佈。由於 GaussianCube 在空間上的結構化組織關係，無需複雜的網絡或訓練設計，標準的 3D 卷積足以有效提取和聚合鄰近高斯的特徵。於是，研究者利用了標準的 U-Net 網路進行擴散，並直接地將原始的 2D 操作符（包括卷積、注意力、上採樣和下採樣）替換為它們的 3D 實作。

本文的三維擴散模型也支援多種條件訊號來控制生成過程，包括類別標籤條件產生、根據圖像條件創建數位化身和根據文字產生三維數位資產。基於多模態條件的生成能力大大擴展了模型的應用範圍，並為未來的 3D 內容創造提供了強大的工具。

實驗結果

#研究人員首先在ShapeNet Car 資料集上驗證了GaussianCube的擬合能力。實驗結果表明，與基線方法相比，GaussianCube 可以以最快的速度和最少的參數量實現高精度的三維物體擬合。

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

表1. 在ShapeNet Car 上不同的三維表示關於空間結構、擬合品質、相對擬合速度、使用參數量的數值比較。 ∗ ^{表示不同物體共用隱式特徵解碼器。所有方法均以 30K 次迭代進行評估。}

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

圖上使用 8. 在 ShapeNet Car 上不同的立體適應能力比較。 ∗ 表示不同物體共用隱式特徵解碼器。所有方法均以 30K 次迭代進行評估。

研究人員其次在大量資料集上驗證了基於GaussianCube 的擴散模型的產生能力，包括ShapeNet、OmniObject3D、合成數位化身資料集和Objaverse 資料集。實驗結果表明，本文的模型在無條件和類別條件的物件生成、數位化身創建以及文字到 3D 合成從數值指標到視覺品質都取得了領先的結果。特別地，GaussianCube 相較之前的基線演算法實現了最高 74% 的效能提升。

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

中對中上進行定量中進行定量的比較條件生成中進行無條件比較。

^{之後中使用中無條件產生的定性比較。本文的方法可以產生精確幾何和細節材質。}

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

^{本文方法可以產生明確語意的複雜物體。}

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

^{中進行定量中建立 3. 建立肖像畫的標準}

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

^{1 本文的方法能夠更準確地還原輸入肖像的身份特徵、表情、配件和頭髮細節。}

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

^{表 4. 基於輸入文字建立三維資產的量化比較。推理時間使用單張 A100 進行檢定。 Shap-E 和 LGM 與本文方法取得了相似的 CLIP Score，但它們分別使用了數百萬的訓練資料（本文僅使用十萬三維資料訓練），和二維文生圖擴散模型先驗。}

The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation

^{圖 12. 中建立定性上為基礎資產的比較。本文的方法可以根據輸入文字實現高品質三維資產生成。}

The above is the detailed content of The most promising episode for high-quality 3D generation? GaussianCube comprehensively surpasses NeRF in 3D generation. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

You Must Build Workplace AI Behind A Veil Of IgnoranceApr 29, 2025 am 11:15 AM

In John Rawls' seminal 1971 book The Theory of Justice, he proposed a thought experiment that we should take as the core of today's AI design and use decision-making: the veil of ignorance. This philosophy provides a simple tool for understanding equity and also provides a blueprint for leaders to use this understanding to design and implement AI equitably. Imagine that you are making rules for a new society. But there is a premise: you don’t know in advance what role you will play in this society. You may end up being rich or poor, healthy or disabled, belonging to a majority or marginal minority. Operating under this "veil of ignorance" prevents rule makers from making decisions that benefit themselves. On the contrary, people will be more motivated to formulate public

Decisions, Decisions… Next Steps For Practical Applied AIApr 29, 2025 am 11:14 AM

Numerous companies specialize in robotic process automation (RPA), offering bots to automate repetitive tasks—UiPath, Automation Anywhere, Blue Prism, and others. Meanwhile, process mining, orchestration, and intelligent document processing speciali

The Agents Are Coming – More On What We Will Do Next To AI PartnersApr 29, 2025 am 11:13 AM

The future of AI is moving beyond simple word prediction and conversational simulation; AI agents are emerging, capable of independent action and task completion. This shift is already evident in tools like Anthropic's Claude. AI Agents: Research a

Why Empathy Is More Important Than Control For Leaders In An AI-Driven FutureApr 29, 2025 am 11:12 AM

Rapid technological advancements necessitate a forward-looking perspective on the future of work. What happens when AI transcends mere productivity enhancement and begins shaping our societal structures? Topher McDougal's upcoming book, Gaia Wakes:

AI For Product Classification: Can Machines Master Tax Law?Apr 29, 2025 am 11:11 AM

Product classification, often involving complex codes like "HS 8471.30" from systems such as the Harmonized System (HS), is crucial for international trade and domestic sales. These codes ensure correct tax application, impacting every inv

Could Data Center Demand Spark A Climate Tech Rebound?Apr 29, 2025 am 11:10 AM

The future of energy consumption in data centers and climate technology investment This article explores the surge in energy consumption in AI-driven data centers and its impact on climate change, and analyzes innovative solutions and policy recommendations to address this challenge. Challenges of energy demand: Large and ultra-large-scale data centers consume huge power, comparable to the sum of hundreds of thousands of ordinary North American families, and emerging AI ultra-large-scale centers consume dozens of times more power than this. In the first eight months of 2024, Microsoft, Meta, Google and Amazon have invested approximately US$125 billion in the construction and operation of AI data centers (JP Morgan, 2024) (Table 1). Growing energy demand is both a challenge and an opportunity. According to Canary Media, the looming electricity

AI And Hollywood's Next Golden AgeApr 29, 2025 am 11:09 AM

Generative AI is revolutionizing film and television production. Luma's Ray 2 model, as well as Runway's Gen-4, OpenAI's Sora, Google's Veo and other new models, are improving the quality of generated videos at an unprecedented speed. These models can easily create complex special effects and realistic scenes, even short video clips and camera-perceived motion effects have been achieved. While the manipulation and consistency of these tools still need to be improved, the speed of progress is amazing. Generative video is becoming an independent medium. Some models are good at animation production, while others are good at live-action images. It is worth noting that Adobe's Firefly and Moonvalley's Ma

Is ChatGPT Slowly Becoming AI's Biggest Yes-Man?Apr 29, 2025 am 11:08 AM

ChatGPT user experience declines: is it a model degradation or user expectations? Recently, a large number of ChatGPT paid users have complained about their performance degradation, which has attracted widespread attention. Users reported slower responses to models, shorter answers, lack of help, and even more hallucinations. Some users expressed dissatisfaction on social media, pointing out that ChatGPT has become “too flattering” and tends to verify user views rather than provide critical feedback. This not only affects the user experience, but also brings actual losses to corporate customers, such as reduced productivity and waste of computing resources. Evidence of performance degradation Many users have reported significant degradation in ChatGPT performance, especially in older models such as GPT-4 (which will soon be discontinued from service at the end of this month). this

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks agoByDDD

Roblox: Dead Rails – How To Summon And Defeat Nikola Tesla

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.