ICML 2024｜複雜組合3D場景生成，LLMs對話式3D可控生成編輯框架來了-人工智慧-PHP中文網

首頁

科技週邊

人工智慧

ICML 2024｜複雜組合3D場景生成，LLMs對話式3D可控生成編輯框架來了

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 31, 2024 pm 08:12 PM

工程GALA3D

ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

AIxiv專欄是本站發布學術、技術內容的欄位。過去數年，本站AIxiv專欄接收通報了2,000多篇內容，涵蓋全球各大專院校與企業的頂尖實驗室，有效促進了學術交流與傳播。如果您有優秀的工作想要分享，歡迎投稿或聯絡報道。投稿信箱：liyazhou@jiqizhixin.com；zhaoyunfeng@jiqizhixin.com

該論文的第一作者和通訊作者均來自北京大學王選計算機研究所的VDIG (Visual Data Interpreting and Generation) 實驗室，第一計算機研究所的第一個作者為博士生週嘯宇，通訊作者為博士生導師王勇濤。 VDIG 實驗室近年來在IJCV、CVPR、AAAI、ICCV、ICML、ECCV 等頂會上有多項代表性成果發表，多次榮獲國內外CV 領域重量級競賽的冠亞軍獎項，和國內外知名高校、科研機構廣泛開展合作。

近年來，針對單個物體的 Text-to-3D 方法取得了一系列突破性進展，但是從文本生成可控的、高質量的複雜多物體 3D 場景仍然面臨巨大挑戰。先前的方法在生成場景的複雜度、幾何品質、紋理一致性、多物件互動關係、可控制性和編輯性等方面均存在較大缺陷。

最近，來自北京大學王選計算機研究所的 VDIG 研究團隊與其合作者公佈了最新研究成果 GALA3D。針對多物體複雜3D 場景生成，該工作提出了LLM 引導的複雜三維場景可控生成框架GALA3D，能夠生成高質量、高一致性、具有多物體和複雜交互關係的3D 場景，支持對話式交互的可控編輯，論文已被ICML 2024 錄用。

ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

論文標題：GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
.pdf/
論文程式碼：https://github.com/VDIGPKU/GALA3D
專案網站：https://gala3d.github.io/

GALA3D 是一個高品質的複雜複雜度組合場景生成與可控編輯框架。使用者輸入一段描述文本，GALA3D 能夠 zero-shot 地產生相應的具有多物體和複雜交互關係的三維場景。 GALA3D 在保證產生 3D 場景與文字高度對齊的同時，展現了其在生成場景品質、多物體複雜互動、場景幾何一致性等方面的卓越表現。此外，GALA3D 還支援用戶友好的端到端生成和可控編輯，使得普通用戶能夠在對話式的交談中輕鬆自訂和編輯 3D 場景。在與使用者的交流中，GALA3D 可以精準地實現複雜三維場景對話式的可控編輯，並根據使用者的對話實現複雜三維場景的佈局變換、數位資產嵌入、裝修風格改變等多樣化的可控編輯需求。 ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

方法介紹

GALA3D 的整體架構如下圖所示：

GALA3D 利用大型語言模型（LLMs）產生初始高斯佈局，並提出引導的生成複雜式 3D 表示場景。 GALA3D 設計透過自適應幾何控制優化 3D 高斯的形狀和分佈，以產生具有一致幾何、紋理、比例和精確交互作用的 3D 場景。此外，GALA3D 還提出了一種組合優化機制，結合條件擴散先驗和文生圖模型，協作生成具有一致風格的3D 多物體場景，同時迭代優化從LLMs 提取的初始佈局先驗，以獲得更加逼真和準確的真實場景空間佈置。廣泛的定量實驗和定性研究表明 GALA3D 在文本到複雜三維場景生成方面取得了顯著效果，超越現有文生 3D 場景方法。 ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

a、基於 LLMs 的場景佈局先驗

Large language models demonstrate excellent natural language understanding and reasoning capabilities. This article further explores the reasoning and layout generation capabilities of LLMs large language models in 3D complex scenes. How to obtain a relatively reasonable layout prior without manual design can help reduce the cost of scene modeling and generation. For this, we use LLMs (such as GPT-3.5) to extract instances of text input and their spatial relationships, and generate corresponding layout priors. However, there is a certain gap between the 3D spatial layout and Layout prior of the scene interpreted by LLMs and the actual scene, which usually results in the generation of suspended/passing objects, combinations of objects with excessively different proportions, etc. Furthermore, we propose a Layout Refinement module to adjust and optimize the rough layout prior generated above through vision-based Diffusion prior and Layout-guided generative 3D Gaussian.

b, Layout Refinement

GALA3D uses the Layout layout optimization module based on Diffusion prior to optimize the layout prior generated by the above LLMs. Specifically, we added the gradient optimization of Layout-guided 3D Gaussian space layout into the 3D generation process, and adjusted the spatial position, rotation angle and size ratio of LLM-generated Layouts through ControlNet. The figure shows the 3D scene and Layout before and after optimization. Correspondence. The optimized Layout has a more accurate spatial position and scale, and makes the interaction between multiple objects in the 3D scene more reasonable.

ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

c, Layout-guided generative 3D Gaussian representation

We introduce 3D-Layout constraints into 3D Gaussian representation for the first time, and propose a layout-guided generative 3D Gaussian for complex Vincent 3D scenes. Layout-guided 3D Gaussian representation contains multiple semantically extracted instance objects, where the Layout prior of each instance object can be parameterized as:

ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

where, N represents the total number of instance objects in the scene. Specifically, each instance 3D Gaussian is optimized through adaptive geometry control to obtain an instance-level object 3D Gaussian representation. Furthermore, we combine multiple object Gaussians into the whole scene according to relative position relationships, generate layout-guided global 3D Gaussians and render the entire scene through global Gaussian Splatting.

d, adaptive geometry control

In order to better control the spatial distribution and geometric shape of 3D Gaussians during the generation process, we propose an adaptive geometry control method for generative 3D Gaussians. First, given a set of initial Gaussians, in order to constrain the 3D Gaussians within the layout range, GALA3D uses a set of density distribution functions to constrain the spatial position of the Gaussian ellipsoid. We then sample Gaussians near the Layout surface to fit the distribution function. Afterwards, we propose to control the geometry of 3D Gaussians using shape regularization. During the 3D generation process, adaptive geometry control continuously optimizes the distribution and geometry of Gaussians to generate 3D multi-objects and scenes with more texture details and regular geometry. Adaptive geometry control also ensures greater controllability and consistency of layout-guided generative 3D Gaussians.

Experimental results

Compared with existing Text-to-3D generation methods, GALA3D shows better 3D scene generation quality and consistency. The quantitative experimental results are shown in the following table:

ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

We also An extensive and effective user survey was conducted, and 125 participants (39.2% of whom were experts and practitioners in related fields) were invited to conduct a multi-angle evaluation of the generation scenarios of this article's method and existing methods. The results are shown in the following table:

ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

Experimental results show that GALA3D surpasses existing methods in multi-dimensional evaluation indicators such as scene quality, geometric fidelity, text consistency, scene consistency, etc., and achieves the optimal generation quality.

As shown in the qualitative experimental results in the figure below, GALA3D can generate complex multi-object combination 3D scenes in zero-shot with good consistency:

ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

The figure below shows that GALA3D can support user-friendly, conversational Controllable generation and editing:

ICML 2024｜复杂组合3D场景生成，LLMs对话式3D可控生成编辑框架来了

For more research details, please refer to the original paper.

以上是ICML 2024｜複雜組合3D場景生成，LLMs對話式3D可控生成編輯框架來了的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

最新的最佳及時工程技術的年度彙編Apr 10, 2025 am 11:22 AM

對於那些可能是我專欄新手的人，我廣泛探討了AI的最新進展，包括體現AI，AI推理，AI中的高科技突破，及時的工程，AI培訓，AI，AI RE RE等主題

歐洲的AI大陸行動計劃：Gigafactories，Data Labs和Green AIApr 10, 2025 am 11:21 AM

歐洲雄心勃勃的AI大陸行動計劃旨在將歐盟確立為人工智能的全球領導者。一個關鍵要素是建立了AI Gigafactories網絡，每個網絡都有大約100,000個高級AI芯片 - 2倍的自動化合物的四倍

微軟對AI代理申請的統一方法：企業的明顯勝利微軟最近公告的新AI代理能力清晰而統一的演講給人留下了深刻的印象。與許多技術公告陷入困境不同

向員工出售AI策略：Shopify首席執行官的宣言Apr 10, 2025 am 11:19 AM

Shopify首席執行官TobiLütke最近的備忘錄大膽地宣布AI對每位員工的基本期望是公司內部的重大文化轉變。這不是短暫的趨勢。這是整合到P中的新操作範式

IBM啟動具有完整AI集成的Z17大型機Apr 10, 2025 am 11:18 AM

IBM的Z17大型機：集成AI用於增強業務運營上個月，在IBM的紐約總部，我收到了Z17功能的預覽。以Z16的成功為基礎（於2022年推出並證明持續的收入增長

5 Chatgpt提示取決於別人並完全相信自己Apr 10, 2025 am 11:17 AM

解鎖不可動搖的信心，消除了對外部驗證的需求！這五個CHATGPT提示將指導您完全自力更生和自我感知的變革轉變。只需複制，粘貼和自定義包圍

AI與您的思想危險相似Apr 10, 2025 am 11:16 AM

人工智能安全與研究公司 Anthropic 最近的一項[研究]開始揭示這些複雜過程的真相，展現出一種令人不安地與我們自身認知領域相似的複雜性。自然智能和人工智能可能比我們想像的更相似。窺探內部：Anthropic 可解釋性研究 Anthropic 進行的研究的新發現代表了機制可解釋性領域的重大進展，該領域旨在反向工程 AI 的內部計算——不僅僅觀察 AI 做了什麼，而是理解它在人工神經元層面如何做到這一點。想像一下，試圖通過繪製當有人看到特定物體或思考特定想法時哪些神經元會放電來理解大腦。 A