search
HomeTechnology peripheralsAIGenerate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images

Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images

  • Project homepage: https://scene-dreamer.github.io/
  • Code: https://github.com/FrozenBurning/SceneDreamer
  • Paper: https://arxiv.org/abs/2302.01330
  • Online Demo: https://huggingface.co/spaces/FrozenBurning/SceneDreamer

To meet the growing demand for 3D creative tools in the Metaverse, 3D scene generation has received considerable attention recently. At the core of 3D content creation is inverse graphics, which aims to recover 3D representations from 2D observations. Considering the cost and labor required to create 3D assets, the ultimate goal of 3D content creation will be to learn 3D generative models from the vast amount of 2D images on the Internet. Recent work on generative models of 3D perception has addressed this problem to some extent, with most of the work leveraging 2D image data to generate object-centric content (e.g., faces, human bodies, or objects). However, the observation space of this type of generation task is in a finite domain, and the generated targets occupy a limited area of ​​three-dimensional space. This raises a question, can we learn 3D generative models of unbounded scenes from massive Internet 2D images? For example, a vivid natural landscape that can cover any large area and expand infinitely (as shown below).

Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images

In this article, researchers from S-Lab of Nanyang Technological University proposed a new framework SceneDreamer, focusing on learning generative models of unbounded three-dimensional scenes from massive unlabeled natural images. By sampling scene noise and style noise, SceneDreamer is able to render diverse styles of natural scenes while maintaining extremely high three-dimensional consistency, allowing the camera to roam freely in the scene.

To achieve such a goal, we face the following three challenges:

1) Unbounded scenes lack efficient three-dimensional representation: no boundaries Scenes often occupy an arbitrarily large Euclidean space, which highlights the importance of efficient and expressive underlying 3D representations.

2) Lack of content alignment: Existing 3D generation work uses data sets with alignment properties (such as faces, human bodies, common objects, etc.). The target objects in these bounded scenes Usually have similar semantics, similar scale position and direction. However, in massive unlabeled 2D images, different objects or scenes often have very different semantics and have variable scales, positions, and orientations. This lack of alignment can lead to instability in generative model training.

3) Lack of camera pose priors: 3D generative models rely on priors of accurate camera poses or camera pose distributions to implement the inverse rendering process from images to 3D representations. However, natural images on the Internet come from different scenes and image sources, making it impossible for us to obtain accurate information or prior information about its camera pose.

To this end, we propose a principled adversarial learning framework SceneDreamer, which learns to generate unbounded three-dimensional scenes from massive unlabeled natural images. The framework consists of three main modules: 1) an efficient and expressive bird's-eye view (BEV) 3D scene representation; 2) a generative neural hash grid that learns a universal representation of the scene; 3) a style-driven volumetric renderer, and Training is performed directly from two-dimensional images through adversarial learning.

Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images

The above figure shows the main structure of SceneDreamer. During the inference process, we can randomly sample a simplex noise Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images representing the scene structure and a Gaussian noise Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images representing the scene style as input, and our model can render Large-scale three-dimensional scenes while supporting free movement of the camera. First we obtain a BEV scene representation consisting of a height map and a semantic map from the scene noiseGenerate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images. Then, the BEV representation is used to explicitly construct a local 3D scene window to perform camera sampling, while encoding the BEV representation into scene featuresGenerate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images. We use the coordinates of sampling points Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images and scene features Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images to query the high-dimensional space encoded by a generative neural hash grid to obtain spatial differences and scene Differential latent variablesGenerate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images. Finally, we integrate the latent variables on the camera light through a volume renderer modulated by style noise, and finally obtain a rendered two-dimensional image.

In order to learn boundaryless 3D scene generation, we hope that the scene should be expressed efficiently and with high quality. We propose to express a large-scale three-dimensional scene using BEV representation consisting of semantic maps and height maps. Specifically, we obtain the height map and semantic map from the bird's-eye view from the scene noise through a non-parametric map construction method. The height map records the height information of the scene surface points, while the semantic map records the semantic labels of the corresponding points. The BEV representation we use, which is composed of a semantic map and a height map, can: 1) represent a three-dimensional scene at n^2 complexity; 2) can obtain the semantics corresponding to the three-dimensional point, thereby solving the problem of content alignment. 3) Supports the use of sliding windows to synthesize infinite scenes, avoiding the generalization problem caused by fixed scene resolution during training.

In order to encode a three-dimensional representation that can generalize between scenes, we need to encode the spatial three-dimensional scene representation into a latent space to facilitate the training of adversarial learning. It is worth noting that for a large-scale unbounded scene, usually only its surface visible points are meaningful for rendering, which means that its parametric form should be compact and sparse. Existing methods such as tri-plane or three-dimensional convolution model space as a whole, but a large amount of model capacity is wasted on modeling invisible surface points. Inspired by the success of neural hash grids on 3D reconstruction tasks, we generalize their spatially compact and efficient properties to generative tasks and propose using generative neural hash grids to model 3D spatial features across scenes. Specifically, the hash function F_theta is used to map the scene features f_s and the spatial point coordinates x to the learnable parameters of the multi-scale mixture:

Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images

In order to ensure the three-dimensional consistency of rendering, we use a rendering network based on volume rendering to complete the mapping of three-dimensional space features to two-dimensional images. For a point on the camera light, we query the generative hash grid to obtain its corresponding feature f_x, use multi-layer MLP modulated by style noise to obtain the color and volume density of its corresponding point, and finally use volume rendering to convert a point All points on the camera ray are integrated into the color of the corresponding pixel.

Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images

The entire framework is directly trained end-to-end on 2D images through adversarial learning. The generator is the volume renderer mentioned above, and for the discriminator we use a semantic-aware discriminative network to distinguish between real and rendered images based on the semantic map projected onto the camera from the BEV representation. Please refer to our paper for more details.

After training is completed, we can generate a variety of 3D scenes by randomly sampling scene noise and style noise, with good depth information and 3D consistency, and support free camera trajectories Rendering:

Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images

Through the sliding window inference mode, we can generate ultra-large unbounded images that far exceed the training spatial resolution. 3D scene. The figure below shows a scene with 10 times the training spatial resolution, and smooth interpolation in both scene and style dimensions

Like similar interpolation smooth transition results, our framework supports The decoupled mode, that is, fixing the scene or style separately for interpolation, reflects the semantic richness of the latent space:

Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images


In order to verify the three-dimensional consistency of our method, we also use circular camera trajectories to render any scene, and reuse COLMAP for three-dimensional reconstruction, which can obtain better scene point clouds and The matching camera poses show that this method can generate a variety of three-dimensional scenes while ensuring three-dimensional consistency:

Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images

This work proposes SceneDreamer, a model for generating unbounded 3D scenes from massive 2D images. We are able to synthesize diverse large-scale 3D scenes from noise while maintaining 3D consistency and supporting free camera trajectories. We hope that this work can provide a new exploration direction and possibility for the game industry, virtual reality and metaverse ecology. Please refer to our project homepage for more details.

The above is the detailed content of Generate mountains and rivers with one click, in various styles, and learn to generate unlimited 3D scenes from 2D images. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
How to Build Your Personal AI Assistant with Huggingface SmolLMHow to Build Your Personal AI Assistant with Huggingface SmolLMApr 18, 2025 am 11:52 AM

Harness the Power of On-Device AI: Building a Personal Chatbot CLI In the recent past, the concept of a personal AI assistant seemed like science fiction. Imagine Alex, a tech enthusiast, dreaming of a smart, local AI companion—one that doesn't rely

AI For Mental Health Gets Attentively Analyzed Via Exciting New Initiative At Stanford UniversityAI For Mental Health Gets Attentively Analyzed Via Exciting New Initiative At Stanford UniversityApr 18, 2025 am 11:49 AM

Their inaugural launch of AI4MH took place on April 15, 2025, and luminary Dr. Tom Insel, M.D., famed psychiatrist and neuroscientist, served as the kick-off speaker. Dr. Insel is renowned for his outstanding work in mental health research and techno

The 2025 WNBA Draft Class Enters A League Growing And Fighting Online HarassmentThe 2025 WNBA Draft Class Enters A League Growing And Fighting Online HarassmentApr 18, 2025 am 11:44 AM

"We want to ensure that the WNBA remains a space where everyone, players, fans and corporate partners, feel safe, valued and empowered," Engelbert stated, addressing what has become one of women's sports' most damaging challenges. The anno

Comprehensive Guide to Python Built-in Data Structures - Analytics VidhyaComprehensive Guide to Python Built-in Data Structures - Analytics VidhyaApr 18, 2025 am 11:43 AM

Introduction Python excels as a programming language, particularly in data science and generative AI. Efficient data manipulation (storage, management, and access) is crucial when dealing with large datasets. We've previously covered numbers and st

First Impressions From OpenAI's New Models Compared To AlternativesFirst Impressions From OpenAI's New Models Compared To AlternativesApr 18, 2025 am 11:41 AM

Before diving in, an important caveat: AI performance is non-deterministic and highly use-case specific. In simpler terms, Your Mileage May Vary. Don't take this (or any other) article as the final word—instead, test these models on your own scenario

AI Portfolio | How to Build a Portfolio for an AI Career?AI Portfolio | How to Build a Portfolio for an AI Career?Apr 18, 2025 am 11:40 AM

Building a Standout AI/ML Portfolio: A Guide for Beginners and Professionals Creating a compelling portfolio is crucial for securing roles in artificial intelligence (AI) and machine learning (ML). This guide provides advice for building a portfolio

What Agentic AI Could Mean For Security OperationsWhat Agentic AI Could Mean For Security OperationsApr 18, 2025 am 11:36 AM

The result? Burnout, inefficiency, and a widening gap between detection and action. None of this should come as a shock to anyone who works in cybersecurity. The promise of agentic AI has emerged as a potential turning point, though. This new class

Google Versus OpenAI: The AI Fight For StudentsGoogle Versus OpenAI: The AI Fight For StudentsApr 18, 2025 am 11:31 AM

Immediate Impact versus Long-Term Partnership? Two weeks ago OpenAI stepped forward with a powerful short-term offer, granting U.S. and Canadian college students free access to ChatGPT Plus through the end of May 2025. This tool includes GPT‑4o, an a

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Will R.E.P.O. Have Crossplay?
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor