Real-time rendering: dynamic urban scene modeling based on Street Gaussians-AI-php.cn

Home

Technology peripherals

Real-time rendering: dynamic urban scene modeling based on Street Gaussians

王林

Jan 08, 2024 pm 01:49 PM

viewScenes

To be honest, the speed of technology update is indeed very fast, which has also led to some old methods in academia being gradually replaced by new methods. Recently, a research team from Zhejiang University proposed a new method called Gaussians, which has attracted widespread attention. This method has unique advantages in solving problems and has been successfully used in work. Although Nerf has gradually lost some influence in academia

In order to help players who have not yet passed the level, let’s take a look at the specific methods of solving puzzles in the game.

To help players who have not passed the level yet, we can learn about the specific puzzle solving methods together. To do this, I found a paper on puzzle solving, the link is here: https://arxiv.org/pdf/2401.01339.pdf. You can learn more about puzzle-solving techniques by reading this paper. Hope this helps players!

This paper aims to solve the problem of modeling dynamic urban street scenes from monocular videos. Recent methods have extended NeRF to incorporate tracked vehicle poses into animate vehicles, enabling photorealistic view synthesis of dynamic urban street scenes. However, their significant limitations are slow training and rendering speeds, coupled with the urgent need for high accuracy in tracking vehicle poses. This paper introduces Street Gaussians, a new explicit scene representation that addresses all these limitations. Specifically, dynamic city streets are represented as a set of point clouds equipped with semantic logits and 3D Gaussians, each associated with a foreground vehicle or background.

To model the dynamics of foreground object vehicles, each object point cloud can be optimized using optimizable tracking poses as well as dynamic spherical harmonic models of dynamic appearance. This explicit representation allows for simple synthesis of target vehicles and backgrounds, and scene editing operations and rendering at 133 FPS (1066×1600 resolution) within half an hour of training. The researchers evaluated this approach on several challenging benchmarks, including the KITTI and Waymo Open datasets.

Experimental results show that our proposed method consistently outperforms existing techniques on all datasets. Although we rely solely on pose information from off-the-shelf trackers, our representation provides performance comparable to that achieved using real pose information.

In order to help players who have not passed the level yet, I have provided you with a link: https://zju3dv.github.io/streetgaussians/, where you can find specific puzzle solving methods. You can click on the link for reference, I hope it can help you.

Street Gaussians Method Introduction

Given a series of images captured from a moving vehicle in an urban street scene, the goal of this paper is to develop a method that can A model that generates photorealistic images from any input time step and any viewpoint. To achieve this goal, a new scene representation, named Street Gaussians, is proposed, specifically designed to represent dynamic street scenes. As shown in Figure 2, the dynamic urban street scene is represented as a set of point clouds, each point cloud corresponding to a static background or a moving vehicle. Explicit point-based representation allows for simple composition of individual models, enabling real-time rendering as well as foreground object decomposition for editing applications. The proposed scene representation can be efficiently trained using only RGB images along with tracked vehicle poses from off-the-shelf trackers, enhanced by our tracked vehicle pose optimization strategy.

Street Gaussians Overview As shown below, dynamic urban street scenes are represented as a set of point-based background and foreground targets with optimized tracked vehicle poses. Each point is assigned a 3D Gaussian including position, opacity and covariance consisting of rotation and scale to represent the geometry. To represent the appearance, each background point is assigned a spherical harmonic model, while the foreground point is associated with a dynamic spherical harmonic model. Explicit point-based representation allows simple combination of separate models, which enables real-time rendering of high-quality images and semantic maps (optional if 2D semantic information is provided during training), as well as decomposition of foreground objects for editing Application

超逼真！实时高质量渲染，用于动态城市场景建模的Street Gaussians

Comparison of Experimental Results

We conducted experiments on the Waymo open dataset and the KITTI benchmark. On the Waymo open data set, 6 recording sequences were selected, which contained a large number of moving objects, significant ego motion, and complex lighting conditions. The length of all sequences is approximately 100 frames, and every 10 images in the sequence are selected as test frames and the remaining images are used for training. When it was found that our baseline method had a high memory cost when training with high-resolution images, the input images were downscaled to 1066×1600. On KITTI and Vitural KITTI 2, the settings of MARS were followed and evaluated using different train/test split settings. Use the bounding boxes generated by the detector and tracker on the Waymo dataset, and use the target trajectory officially provided by KITTI.

超逼真！实时高质量渲染，用于动态城市场景建模的Street Gaussians

Compare our method with three recent methods.

(1) NSG represents the background as a multi-plane image and uses latent codes learned for each object and shared decoders to model moving objects.

(2) MARS builds the scene graph based on Nerfstudio.

(3) 3D Gaussian uses a set of anisotropic Gaussians to model the scene.

Both NSG and MARS are trained and evaluated using GT boxes, different versions of their implementations are tried here and the best results for each sequence are reported. We also replace SfM point clouds in 3D Gaussian maps with the same input as our method for fair comparison. See supplementary information for details.

超逼真！实时高质量渲染，用于动态城市场景建模的Street Gaussians

#Original link: https://mp.weixin.qq.com/s/oikZWcR47otm7xfU90JH4g

The above is the detailed content of Real-time rendering: dynamic urban scene modeling based on Street Gaussians. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

You Must Build Workplace AI Behind A Veil Of IgnoranceApr 29, 2025 am 11:15 AM

In John Rawls' seminal 1971 book The Theory of Justice, he proposed a thought experiment that we should take as the core of today's AI design and use decision-making: the veil of ignorance. This philosophy provides a simple tool for understanding equity and also provides a blueprint for leaders to use this understanding to design and implement AI equitably. Imagine that you are making rules for a new society. But there is a premise: you don’t know in advance what role you will play in this society. You may end up being rich or poor, healthy or disabled, belonging to a majority or marginal minority. Operating under this "veil of ignorance" prevents rule makers from making decisions that benefit themselves. On the contrary, people will be more motivated to formulate public

Decisions, Decisions… Next Steps For Practical Applied AIApr 29, 2025 am 11:14 AM

Numerous companies specialize in robotic process automation (RPA), offering bots to automate repetitive tasks—UiPath, Automation Anywhere, Blue Prism, and others. Meanwhile, process mining, orchestration, and intelligent document processing speciali

The Agents Are Coming – More On What We Will Do Next To AI PartnersApr 29, 2025 am 11:13 AM

The future of AI is moving beyond simple word prediction and conversational simulation; AI agents are emerging, capable of independent action and task completion. This shift is already evident in tools like Anthropic's Claude. AI Agents: Research a

Why Empathy Is More Important Than Control For Leaders In An AI-Driven FutureApr 29, 2025 am 11:12 AM

Rapid technological advancements necessitate a forward-looking perspective on the future of work. What happens when AI transcends mere productivity enhancement and begins shaping our societal structures? Topher McDougal's upcoming book, Gaia Wakes:

AI For Product Classification: Can Machines Master Tax Law?Apr 29, 2025 am 11:11 AM

Product classification, often involving complex codes like "HS 8471.30" from systems such as the Harmonized System (HS), is crucial for international trade and domestic sales. These codes ensure correct tax application, impacting every inv

Could Data Center Demand Spark A Climate Tech Rebound?Apr 29, 2025 am 11:10 AM

The future of energy consumption in data centers and climate technology investment This article explores the surge in energy consumption in AI-driven data centers and its impact on climate change, and analyzes innovative solutions and policy recommendations to address this challenge. Challenges of energy demand: Large and ultra-large-scale data centers consume huge power, comparable to the sum of hundreds of thousands of ordinary North American families, and emerging AI ultra-large-scale centers consume dozens of times more power than this. In the first eight months of 2024, Microsoft, Meta, Google and Amazon have invested approximately US$125 billion in the construction and operation of AI data centers (JP Morgan, 2024) (Table 1). Growing energy demand is both a challenge and an opportunity. According to Canary Media, the looming electricity

AI And Hollywood's Next Golden AgeApr 29, 2025 am 11:09 AM

Generative AI is revolutionizing film and television production. Luma's Ray 2 model, as well as Runway's Gen-4, OpenAI's Sora, Google's Veo and other new models, are improving the quality of generated videos at an unprecedented speed. These models can easily create complex special effects and realistic scenes, even short video clips and camera-perceived motion effects have been achieved. While the manipulation and consistency of these tools still need to be improved, the speed of progress is amazing. Generative video is becoming an independent medium. Some models are good at animation production, while others are good at live-action images. It is worth noting that Adobe's Firefly and Moonvalley's Ma

Is ChatGPT Slowly Becoming AI's Biggest Yes-Man?Apr 29, 2025 am 11:08 AM

ChatGPT user experience declines: is it a model degradation or user expectations? Recently, a large number of ChatGPT paid users have complained about their performance degradation, which has attracted widespread attention. Users reported slower responses to models, shorter answers, lack of help, and even more hallucinations. Some users expressed dissatisfaction on social media, pointing out that ChatGPT has become “too flattering” and tends to verify user views rather than provide critical feedback. This not only affects the user experience, but also brings actual losses to corporate customers, such as reduced productivity and waste of computing resources. Evidence of performance degradation Many users have reported significant degradation in ChatGPT performance, especially in older models such as GPT-4 (which will soon be discontinued from service at the end of this month). this

See all articles