AIxiv專欄是本站發布學術、技術內容的欄位。過去數年,本站AIxiv專欄接收通報了2,000多篇內容,涵蓋全球各大專院校與企業的頂尖實驗室,有效促進了學術交流與傳播。如果您有優秀的工作想要分享,歡迎投稿或聯絡報道。投稿信箱:liyazhou@jiqizhixin.com;zhaoyunfeng@jiqizhixin.com
該論文的第一作者和通訊作者均來自北京大學王選計算機研究所的VDIG (Visual Data Interpreting and Generation) 實驗室,第一計算機研究所的第一個作者為博士生週嘯宇,通訊作者為博士生導師王勇濤。 VDIG 實驗室近年來在IJCV、CVPR、AAAI、ICCV、ICML、ECCV 等頂會上有多項代表性成果發表,多次榮獲國內外CV 領域重量級競賽的冠亞軍獎項,和國內外知名高校、科研機構廣泛開展合作。
近年來,針對單個物體的 Text-to-3D 方法取得了一系列突破性進展,但是從文本生成可控的、高質量的複雜多物體 3D 場景仍然面臨巨大挑戰。先前的方法在生成場景的複雜度、幾何品質、紋理一致性、多物件互動關係、可控制性和編輯性等方面均存在較大缺陷。
最近,來自北京大學王選計算機研究所的 VDIG 研究團隊與其合作者公佈了最新研究成果 GALA3D。針對多物體複雜3D 場景生成,該工作提出了LLM 引導的複雜三維場景可控生成框架GALA3D,能夠生成高質量、高一致性、具有多物體和複雜交互關係的3D 場景,支持對話式交互的可控編輯,論文已被ICML 2024 錄用。
論文標題:GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
.pdf/
論文程式碼:https://github.com/VDIGPKU/GALA3DGALA3D 是一個高品質的複雜複雜度組合場景生成與可控編輯框架。使用者輸入一段描述文本,GALA3D 能夠 zero-shot 地產生相應的具有多物體和複雜交互關係的三維場景。 GALA3D 在保證產生 3D 場景與文字高度對齊的同時,展現了其在生成場景品質、多物體複雜互動、場景幾何一致性等方面的卓越表現。此外,GALA3D 還支援用戶友好的端到端生成和可控編輯,使得普通用戶能夠在對話式的交談中輕鬆自訂和編輯 3D 場景。在與使用者的交流中,GALA3D 可以精準地實現複雜三維場景對話式的可控編輯,並根據使用者的對話實現複雜三維場景的佈局變換、數位資產嵌入、裝修風格改變等多樣化的可控編輯需求。
方法介紹
GALA3D 的整體架構如下圖所示:
GALA3D 利用大型語言模型(LLMs)產生初始高斯佈局,並提出引導的生成複雜式 3D 表示場景。 GALA3D 設計透過自適應幾何控制優化 3D 高斯的形狀和分佈,以產生具有一致幾何、紋理、比例和精確交互作用的 3D 場景。此外,GALA3D 還提出了一種組合優化機制,結合條件擴散先驗和文生圖模型,協作生成具有一致風格的3D 多物體場景,同時迭代優化從LLMs 提取的初始佈局先驗,以獲得更加逼真和準確的真實場景空間佈置。廣泛的定量實驗和定性研究表明 GALA3D 在文本到複雜三維場景生成方面取得了顯著效果,超越現有文生 3D 場景方法。
a、基於 LLMs 的場景佈局先驗
Large language models demonstrate excellent natural language understanding and reasoning capabilities. This article further explores the reasoning and layout generation capabilities of LLMs large language models in 3D complex scenes. How to obtain a relatively reasonable layout prior without manual design can help reduce the cost of scene modeling and generation. For this, we use LLMs (such as GPT-3.5) to extract instances of text input and their spatial relationships, and generate corresponding layout priors. However, there is a certain gap between the 3D spatial layout and Layout prior of the scene interpreted by LLMs and the actual scene, which usually results in the generation of suspended/passing objects, combinations of objects with excessively different proportions, etc. Furthermore, we propose a Layout Refinement module to adjust and optimize the rough layout prior generated above through vision-based Diffusion prior and Layout-guided generative 3D Gaussian.
b, Layout Refinement
GALA3D uses the Layout layout optimization module based on Diffusion prior to optimize the layout prior generated by the above LLMs. Specifically, we added the gradient optimization of Layout-guided 3D Gaussian space layout into the 3D generation process, and adjusted the spatial position, rotation angle and size ratio of LLM-generated Layouts through ControlNet. The figure shows the 3D scene and Layout before and after optimization. Correspondence. The optimized Layout has a more accurate spatial position and scale, and makes the interaction between multiple objects in the 3D scene more reasonable.
c, Layout-guided generative 3D Gaussian representation
We introduce 3D-Layout constraints into 3D Gaussian representation for the first time, and propose a layout-guided generative 3D Gaussian for complex Vincent 3D scenes. Layout-guided 3D Gaussian representation contains multiple semantically extracted instance objects, where the Layout prior of each instance object can be parameterized as:
where, N represents the total number of instance objects in the scene. Specifically, each instance 3D Gaussian is optimized through adaptive geometry control to obtain an instance-level object 3D Gaussian representation. Furthermore, we combine multiple object Gaussians into the whole scene according to relative position relationships, generate layout-guided global 3D Gaussians and render the entire scene through global Gaussian Splatting.
d, adaptive geometry control
In order to better control the spatial distribution and geometric shape of 3D Gaussians during the generation process, we propose an adaptive geometry control method for generative 3D Gaussians. First, given a set of initial Gaussians, in order to constrain the 3D Gaussians within the layout range, GALA3D uses a set of density distribution functions to constrain the spatial position of the Gaussian ellipsoid. We then sample Gaussians near the Layout surface to fit the distribution function. Afterwards, we propose to control the geometry of 3D Gaussians using shape regularization. During the 3D generation process, adaptive geometry control continuously optimizes the distribution and geometry of Gaussians to generate 3D multi-objects and scenes with more texture details and regular geometry. Adaptive geometry control also ensures greater controllability and consistency of layout-guided generative 3D Gaussians.
Experimental results
Compared with existing Text-to-3D generation methods, GALA3D shows better 3D scene generation quality and consistency. The quantitative experimental results are shown in the following table:
We also An extensive and effective user survey was conducted, and 125 participants (39.2% of whom were experts and practitioners in related fields) were invited to conduct a multi-angle evaluation of the generation scenarios of this article's method and existing methods. The results are shown in the following table:
Experimental results show that GALA3D surpasses existing methods in multi-dimensional evaluation indicators such as scene quality, geometric fidelity, text consistency, scene consistency, etc., and achieves the optimal generation quality.
As shown in the qualitative experimental results in the figure below, GALA3D can generate complex multi-object combination 3D scenes in zero-shot with good consistency:
The figure below shows that GALA3D can support user-friendly, conversational Controllable generation and editing:
For more research details, please refer to the original paper.
以上是ICML 2024|複雜組合3D場景生成,LLMs對話式3D可控生成編輯框架來了的詳細內容。更多資訊請關注PHP中文網其他相關文章!