search
HomeTechnology peripheralsAICVPR 2024 high-scoring paper: New generative editing framework GenN2N, unifying NeRF conversion tasks

CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

#The AIxiv column of our website is a column about academic and technical content. In the past few years, the AIxiv column on our website has received more than 2,000 pieces of content, covering top laboratories from major universities and companies around the world, helping to promote academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. The submission email address is liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com.


Researchers from Hong Kong University of Science and Technology and Tsinghua University proposed "GenN2N", a unified generative NeRF-to-NeRF conversion framework. Suitable for various NeRF conversion tasks, such as text-driven NeRF editing, coloring, super-resolution, repair, etc., with extremely excellent performance! CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

  • Paper address: https://arxiv.org/abs/2404.02788
  • Paper homepage: https://xiangyueliu.github.io/GenN2N/
  • Github address: https://github.com/Lxiangyue/GenN2N
  • Paper title: GenN2N: Generative NeRF2NeRF Translation

In recent years, Neural Radiation Fields (NeRF) have become popular due to their compactness ,high quality and versatility have attracted widespread attention ,in the fields of 3D reconstruction, 3D generation and ,new perspective synthesis. However, once a NeRF scene is created, these methods often lack further control over the resulting geometry and appearance. Therefore, NeRF Editing has recently become a research focus worthy of attention.

Current NeRF editing methods are usually task-specific, such as text-driven editing of NeRF, super-resolution, repair, and colorization. These methods require a large amount of task-specific domain knowledge. In the field of 2D image editing, it has become a trend to develop universal image-to-image conversion methods. For example, the 2D generative model Stable Difussion is used to support multi-functional image editing. Therefore, we propose universal NeRF editing utilizing underlying 2D generative models.

A challenge that comes with this is the representation gap between NeRF and 2D images, especially since image editors often generate multiple inconsistent edits for different viewpoints. A recent text-based NeRF editing method, Instruct-NeRF2NeRF, explores this. It adopts the "rendering-editing-aggregation" process to gradually update the NeRF scene by gradually rendering multi-view images, editing these images, and aggregating the edited images into NeRF. However, this editing method, after a lot of optimization for specific editing needs, can only generate a specific editing result. If the user is not satisfied, iterative attempts need to be repeated.

Therefore, we propose "GenN2N", a general NeRF-to-NeRF framework suitable for a variety of NeRF editing tasks. Its core lies in generating This method is used to describe the multi-solution nature of the editing process, so that it can easily generate a large number of editing results that meet the requirements for users to choose with the help of generative editing.

In the core part of GenN2N, 1) the generative framework of 3D VAE-GAN is introduced, using VAE to represent the entire editing space to learn 2D editing with a set of inputs All possible 3D NeRF editing distributions corresponding to the image, and use GAN to provide reasonable supervision for different views of the editing NeRF to ensure the authenticity of the editing results. 2) Use contrastive learning to decouple the editing content and perspective to ensure the editing content between different perspectives. Consistency, 3) During inference, the user simply randomly samples multiple editing codes from the conditional generation model to generate various 3D editing results corresponding to the editing target.

Compared with SOTA methods for various NeRF editing tasks (ICCV2023 Oral, etc.), GenN2N is superior to existing methods in terms of editing quality, diversity, efficiency, etc.

Method introduction

We first perform 2D image editing, and then edit these 2D images Upgrade to 3D NeRF to achieve generative NeRF-to-NeRF conversion.

CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

A. Latent Distill

We use Latent Distill Module as the encoder of VAE to learn one for each edited image An implicit editing code that controls the generated content during NeRF-to-NeRF conversion. All editing codes obey a good normal distribution under the constraint of KL loss for better sampling. In order to decouple editing content and perspective, we carefully designed comparative learning to encourage the editing codes of pictures with the same editing style but different perspectives to be similar, and the editing codes of pictures with different editing styles but the same perspective to be far away from each other.

B.NeRF-to-NeRF conversion (Translated NeRF)

us NeRF-to-NeRF Translation is used as the decoder of VAE, which takes the editing code as input and modifies the original NeRF into a converted NeRF. We added residual layers between the hidden layers of the original NeRF network. These residual layers use the editing code as input to modulate the hidden layer neurons, so that the converted NeRF can not only retain the original NeRF information, but also control the 3D conversion based on the editing code. content. At the same time, NeRF-to-NeRF Translation also serves as a generator to participate in generative adversarial training. By generating rather than optimizing, we can obtain multiple conversion results at once, significantly improving NeRF conversion efficiency and result diversity.

C. Conditional Discriminator

##Convert NeRF rendering image It constitutes a generation space that needs to be judged. The editing styles and rendering perspectives of these pictures are different, making the generation space very complex. Therefore we provide a condition as additional information for the discriminator. Specifically, when the discriminator identifies the generator's rendered picture
(negative sample) or the edited picture CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务 (positive sample) in the training data, we select an edited picture of the same perspective from the training data Picture CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务 is used as a condition, which prevents the discriminator from being interfered by perspective factors when distinguishing positive and negative samples. CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务

D. Inference

After GenN2N optimization, users can Randomly sample the editing code from the normal distribution, and input the converted NeRF to generate an edited high-quality, multi-view consistent 3D NeRF scene.

Experiments

We conducted on various NeRF-to-NeRF tasks Extensive experiments including NeRF text-driven editing, colorization, super-resolution, inpainting, and more. Experimental results demonstrate GenN2N’s superior editing quality, multi-view consistency, generated diversity, and editing efficiency.

A. Text-based NeRF editingCVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务B.NeRF coloring CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务C.NeRF Super Resolution CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务D.NeRF Repair CVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务
Comparative experiments

Our method is qualitatively and quantitatively compared with SOTA methods for various specific NeRF tasks (including text-driven editing, coloring , super-resolution and restoration, etc.). The results show that GenN2N, as a general framework, performs as well as or better than task-specific SOTA, while the editing results have greater diversity (the following is a comparison between GenN2N and Instruct-NeRF2NeRF on the text-based NeRF editing task).

A. Text-based NeRF EditorCVPR 2024高分论文:全新生成式编辑框架GenN2N,统一NeRF转换任务
Learn more about experiments and methods , please refer to the paper homepage.

Team introduction

This paper comes from the Tan Ping team of Hong Kong University of Science and Technology and Tsinghua University 3DVICI Lab, Shanghai Artificial Intelligence Laboratory and Shanghai Qizhi Research Institute. The authors of the paper are Liu Xiangyue, a student of Hong Kong University of Science and Technology, Xue Han, a student of Tsinghua University, and Luo Kunming, a student of Hong Kong University of Science and Technology. The instructors are Professor Yi Li of Tsinghua University and Hong Kong Science and Technology Teacher Tan Ping from the university.

The above is the detailed content of CVPR 2024 high-scoring paper: New generative editing framework GenN2N, unifying NeRF conversion tasks. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete
Does Hugging Face's 7B Model OlympicCoder Beat Claude 3.7?Does Hugging Face's 7B Model OlympicCoder Beat Claude 3.7?Apr 23, 2025 am 11:49 AM

Hugging Face's OlympicCoder-7B: A Powerful Open-Source Code Reasoning Model The race to develop superior code-focused language models is intensifying, and Hugging Face has joined the competition with a formidable contender: OlympicCoder-7B, a product

4 New Gemini Features You Can't Afford to Miss4 New Gemini Features You Can't Afford to MissApr 23, 2025 am 11:48 AM

How many of you have wished AI could do more than just answer questions? I know I have, and as of late, I’m amazed by how it’s transforming. AI chatbots aren’t just about chatting anymore, they’re about creating, researchin

Camunda Writes New Score For Agentic AI OrchestrationCamunda Writes New Score For Agentic AI OrchestrationApr 23, 2025 am 11:46 AM

As smart AI begins to be integrated into all levels of enterprise software platforms and applications (we must emphasize that there are both powerful core tools and some less reliable simulation tools), we need a new set of infrastructure capabilities to manage these agents. Camunda, a process orchestration company based in Berlin, Germany, believes it can help smart AI play its due role and align with accurate business goals and rules in the new digital workplace. The company currently offers intelligent orchestration capabilities designed to help organizations model, deploy and manage AI agents. From a practical software engineering perspective, what does this mean? The integration of certainty and non-deterministic processes The company said the key is to allow users (usually data scientists, software)

Is There Value In A Curated Enterprise AI Experience?Is There Value In A Curated Enterprise AI Experience?Apr 23, 2025 am 11:45 AM

Attending Google Cloud Next '25, I was keen to see how Google would distinguish its AI offerings. Recent announcements regarding Agentspace (discussed here) and the Customer Experience Suite (discussed here) were promising, emphasizing business valu

How to Find the Best Multilingual Embedding Model for Your RAG?How to Find the Best Multilingual Embedding Model for Your RAG?Apr 23, 2025 am 11:44 AM

Selecting the Optimal Multilingual Embedding Model for Your Retrieval Augmented Generation (RAG) System In today's interconnected world, building effective multilingual AI systems is paramount. Robust multilingual embedding models are crucial for Re

Musk: Robotaxis In Austin Need Intervention Every 10,000 MilesMusk: Robotaxis In Austin Need Intervention Every 10,000 MilesApr 23, 2025 am 11:42 AM

Tesla's Austin Robotaxi Launch: A Closer Look at Musk's Claims Elon Musk recently announced Tesla's upcoming robotaxi launch in Austin, Texas, initially deploying a small fleet of 10-20 vehicles for safety reasons, with plans for rapid expansion. H

AI's Shocking Pivot: From Work Tool To Digital Therapist And Life CoachAI's Shocking Pivot: From Work Tool To Digital Therapist And Life CoachApr 23, 2025 am 11:41 AM

The way artificial intelligence is applied may be unexpected. Initially, many of us might think it was mainly used for creative and technical tasks, such as writing code and creating content. However, a recent survey reported by Harvard Business Review shows that this is not the case. Most users seek artificial intelligence not just for work, but for support, organization, and even friendship! The report said that the first of AI application cases is treatment and companionship. This shows that its 24/7 availability and the ability to provide anonymous, honest advice and feedback are of great value. On the other hand, marketing tasks (such as writing a blog, creating social media posts, or advertising copy) rank much lower on the popular use list. Why is this? Let's see the results of the research and how it continues to be

Companies Race Toward AI Agent AdoptionCompanies Race Toward AI Agent AdoptionApr 23, 2025 am 11:40 AM

The rise of AI agents is transforming the business landscape. Compared to the cloud revolution, the impact of AI agents is predicted to be exponentially greater, promising to revolutionize knowledge work. The ability to simulate human decision-maki

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment