Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes-AI-php.cn

Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jan 10, 2024 pm 11:09 PM

image2d

It only takes two minutes to convert pictures into 3D!

It is still the kind with high texture quality and high consistency in multiple viewing angles.

Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes

No matter what species it is, the single-view image when input is still like this:

Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes

Two minutes later , the 3D version is done:

##△Top, Repaint123 (

NeRF); Bottom, Repaint123 (GS)

The new method is called

Repaint123. The core idea is to combine the powerful image generation capability of the 2D diffusion model with the texture alignment capability of the repaint strategy to generate high-quality, consistent images from multiple perspectives.

In addition, this research also introduces a visibility-aware adaptive repaint intensity method for overlapping areas.

Repaint123 solves the problems of previous methods such as large multi-view deviation, texture degradation, and slow generation in one fell swoop.

Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes

The project code has not yet been published on GitHub, but 100 people have come to mark the code:

Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes

What does Repaint123 look like?

Previously, the method of converting images to 3D usually used Score Distillation Sampling (SDS). Although the results of this method are impressive, there are some issues such as multi-view inconsistency, over-saturation, over-smoothed textures, and slow generation.

△From top to bottom: input, Zero123-XL, Magic123, Dream gaussian

In order to solve these problems, from Peking University and Pengcheng Laboratory Researchers from , National University of Singapore, and Wuhan University proposed Repaint123.

Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes

In general, Repaint123 has the following contributions:

(1) Repaint123 generates a controllable redrawing process from images to 3D by comprehensively considering it , able to generate high-quality image sequences and ensure that these images are consistent across multiple viewing angles.

(2) Repaint123 proposed a simple baseline method for single-view 3D generation.

In the rough model stage, it uses Zero123 as the 3D prior, combined with the SDS loss function, to quickly generate a rough 3D model (only 1 minute) by optimizing the Gaussian Splatting geometry.

In the fine model stage, it uses Stable Diffusion as the 2D prior, combined with the mean square error (MSE) loss function, to generate high-quality 3D models by quickly refining the mesh texture (also only 1 minute).

(3) A large number of experiments have proven the effectiveness of the Repaint123 method. It is able to generate high-quality 3D content that matches 2D generation quality from a single image in just 2 minutes.

△Achieve 3D consistent and high-quality single-view 3D rapid generation

Let’s look at the specific methods.

Repaint123 focuses on optimizing the mesh refinement stage, and its main improvement directions cover two aspects: generating high-quality image sequences with multi-view consistency and achieving fast and high-quality 3D reconstruction.

1. Generating a high-quality image sequence with multi-view consistency

Generating a high-quality image sequence with multi-view consistency is divided into the following three parts:

△Consistent image generation process from multiple perspectives

DDIM inversion

In order to retain the generation in the rough model stage To obtain consistent 3D low-frequency texture information, the author uses DDIM inversion to invert the image into a determined latent space, laying the foundation for the subsequent denoising process and generating faithful and consistent images.

Controllable denoising

In order to control the geometric consistency and long-range texture consistency in the denoising stage, the author introduced ControlNet, using the depth map rendered by the coarse model as a geometric prior, and at the same time injecting the Attention feature of the reference map for texture migration.

In addition, in order to perform classifier-free guidance to improve image quality, the paper uses CLIP to encode reference images into image cues for guiding the denoising network.

Redraw

Progressive redrawing of occlusions and overlapping portions To ensure that overlapping areas of adjacent images in an image sequence are aligned at the pixel level, the author uses progressive local Redraw strategy.

While keeping overlapping areas unchanged, harmonious adjacent areas are generated and gradually extend to 360° from the reference perspective.

However, as shown in the figure below, the author found that the overlapping area also needs to be refined, because the visual resolution of the previously strabismused area becomes larger during emmetropia, and more high-frequency information needs to be added.

In addition, the thinning intensity is equal to 1-cosθ*, where θ* is the maximum value of the angle θ between all previous camera angles and the normal vector of the viewed surface, Thereby adaptively redrawing overlapping areas.

△The relationship between camera angle and thinning intensity

In order to choose the appropriate thinning intensity to ensure fidelity while improving quality, the author draws lessons from Based on the projection theorem and the idea of image super-resolution, a simple and direct visibility-aware redrawing strategy is proposed to refine the overlapping areas.

2. Fast and high-quality 3D reconstruction

As shown in the figure below, the author uses two methods in the process of fast and high-quality 3D reconstruction. stage approach.

△Repaint123 two-stage single-view 3D generation framework

First, they utilize Gaussian Splatting representation to quickly generate reasonable geometric structures and rough textures.

At the same time, with the help of the previously generated multi-view consistent high-quality image sequence, the author is able to use a simple mean square error (MSE) loss for fast 3D texture reconstruction.

Optimum for Consistency, Quality and Speed

Researchers compared multiple approaches for single-view generation tasks.

△Single-view 3D generation visualization comparison

On RealFusion15 and Test-alpha data sets, Repaint123 achieved three results in consistency, quality and speed. The most advanced effect in terms of performance.

Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes

At the same time, the author also conducted ablation experiments on the effectiveness of each module used in the paper and the increment of perspective rotation:

Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes

It was also found that when the viewing angle interval is 60 degrees, the performance reaches the peak, but an excessive viewing angle interval will reduce the overlapping area and increase the possibility of multi-faceted problems, so 40 degrees can be used as the optimal viewing angle interval.

Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes

Paper address: https://arxiv.org/pdf/2312.13271.pdf
Code address: https:// pku-yuangroup.github.io/repaint123/
Project address: https://pku-yuangroup.github.io/repaint123/

The above is the detailed content of Produced by Peking University: The latest SOTA with texture quality and multi-view consistency, achieving 3D conversion of one image in 2 minutes. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

As AI Use Soars, Companies Shift From SEO To GEOMay 05, 2025 am 11:09 AM

With the explosion of AI applications, enterprises are shifting from traditional search engine optimization (SEO) to generative engine optimization (GEO). Google is leading the shift. Its "AI Overview" feature has served over a billion users, providing full answers before users click on the link. [^2] Other participants are also rapidly rising. ChatGPT, Microsoft Copilot and Perplexity are creating a new “answer engine” category that completely bypasses traditional search results. If your business doesn't show up in these AI-generated answers, potential customers may never find you—even if you rank high in traditional search results. From SEO to GEO – What exactly does this mean? For decades

Big Bets On Which Of These Pathways Will Push Today's AI To Become Prized AGIMay 05, 2025 am 11:08 AM

Let's explore the potential paths to Artificial General Intelligence (AGI). This analysis is part of my ongoing Forbes column on AI advancements, delving into the complexities of achieving AGI and Artificial Superintelligence (ASI). (See related art

Do You Train Your Chatbot, Or Vice Versa?May 05, 2025 am 11:07 AM

Human-computer interaction: a delicate dance of adaptation Interacting with an AI chatbot is like participating in a delicate dance of mutual influence. Your questions, responses, and preferences gradually shape the system to better meet your needs. Modern language models adapt to user preferences through explicit feedback mechanisms and implicit pattern recognition. They learn your communication style, remember your preferences, and gradually adjust their responses to fit your expectations. Yet, while we train our digital partners, something equally important is happening in the reverse direction. Our interactions with these systems are subtly reshaping our own communication patterns, thinking processes, and even expectations of interpersonal conversations. Our interactions with AI systems have begun to reshape our expectations of interpersonal interactions. We adapted to instant response,

California Taps AI To Fast-Track Wildfire Recovery PermitsMay 04, 2025 am 11:10 AM

AI Streamlines Wildfire Recovery Permitting Australian tech firm Archistar's AI software, utilizing machine learning and computer vision, automates the assessment of building plans for compliance with local regulations. This pre-validation significan

What The US Can Learn From Estonia's AI-Powered Digital GovernmentMay 04, 2025 am 11:09 AM

Estonia's Digital Government: A Model for the US? The US struggles with bureaucratic inefficiencies, but Estonia offers a compelling alternative. This small nation boasts a nearly 100% digitized, citizen-centric government powered by AI. This isn't

Wedding Planning Via Generative AIMay 04, 2025 am 11:08 AM

Planning a wedding is a monumental task, often overwhelming even the most organized couples. This article, part of an ongoing Forbes series on AI's impact (see link here), explores how generative AI can revolutionize wedding planning. The Wedding Pl

What Are Digital Defense AI Agents?May 04, 2025 am 11:07 AM

Businesses increasingly leverage AI agents for sales, while governments utilize them for various established tasks. However, consumer advocates highlight the need for individuals to possess their own AI agents as a defense against the often-targeted

A Business Leader's Guide To Generative Engine Optimization (GEO)May 03, 2025 am 11:14 AM

Google is leading this shift. Its "AI Overviews" feature already serves more than one billion users, providing complete answers before anyone clicks a link.[^2] Other players are also gaining ground fast. ChatGPT, Microsoft Copilot, and Pe

See all articles