search
HomeTechnology peripheralsAICVPR best paper candidate | New breakthrough in NeRF, using heuristic-guided segmentation to remove transient interference without additional prior knowledge

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The first author of the paper is Chen Jiahao, a second-year master's student in the School of Computer Science, Sun Yat-sen University. His research direction is neural rendering and three-dimensional reconstruction. His supervisor is Professor Li Guanbin. . The paper was his first work. The corresponding author of the paper is Professor Li Guanbin from the School of Computer Science and Human-Machine-Object Intelligent Integration Laboratory of Sun Yat-sen University, a doctoral supervisor and winner of the National Outstanding Youth Fund. The team's main research areas are visual perception, scene modeling, understanding and generation. To date, he has published more than 150 CCF Category A/CAS Area 1 papers, which have been cited by Google Scholar more than 12,000 times. He has won the Wu Wenjun Artificial Intelligence Outstanding Youth Award and other honors.

Since it was proposed, Neural Radiance Fields (NeRF) have received great attention due to their excellent performance in new perspective synthesis and three-dimensional reconstruction.

Although a lot of work is trying to improve the rendering quality or running speed of NeRF, a practical problem is rarely mentioned: If unexpected transient interference appears in the scene to be modeled, we How to eliminate their impact on NeRF?

In this article, researchers from Sun Yat-sen University, Cardiff University, University of Pennsylvania and Simou Technology conducted in-depth research on this and proposed a novel paradigm to solve this problem.

By summarizing the advantages and disadvantages of existing methods and broadening the application ideas of existing technologies, this method can not only accurately distinguish static and transient elements in various scenes and improve the rendering quality of NeRF, but has also been shortlisted for CVPR 2024 Best Paper Candidate.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识

  • Paper link: https://arxiv.org/abs/2403.17537
  • Project link: https://www.sysu-hcp.net/projects/cv/132.html

Let us understand this work together.

Background introduction

New perspective synthesis is an important task in computer vision and graphics. The algorithm model needs to use given multi-view images and camera poses to generate images corresponding to the target pose. . NeRF has achieved important breakthroughs on this task, but its effectiveness is related to the assumption of static scenes.

Specifically, NeRF requires that the scene to be modeled remains stationary during the shooting process, and the multi-view image content must be consistent. In reality, it is difficult for us to meet this requirement. For example, when shooting outdoors, vehicles or passers-by outside the scene may move randomly in the lens, and when shooting indoors, an object or shadow may inadvertently block the lens. We call elements that exhibit motion or inconsistency outside of this type of scene transient distractors. If we cannot eliminate them, they will introduce artifacts into NeRF's rendering results.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识(The existence of transient interference (yellow box) can lead to a large number of pseudohadoscopy.

The current methods to solve the problem of transient interferers can be roughly divided into two types.
The first method uses existing segmentation models such as semantic segmentation to explicitly obtain masks related to distractors, and then masks the corresponding pixels when training NeRF
. Although such methods can produce accurate segmentation results, they are not universal. This is because we need to know the prior knowledge related to the distractors (such as object category, initial mask, etc.) in advance, and the model can identify these distractors.
Different from the first method,
the second method uses a heuristic algorithm to implicitly handle transient distractors when training NeRF and does not require prior knowledge
. Although such methods are more general, they cannot accurately separate transient distractors and static scene elements due to design complexity and high degree of ill-posedness. For example, since the color texture corresponding to a transient pixel is inconsistent at different viewing angles, the color residual between the predicted value and the true value of this pixel is often larger than the residual of a static pixel when training NeRF. However, high-frequency static details in the scene will also have excessive residuals due to difficulty in fitting. Therefore, some methods that remove transient interference by setting residual thresholds can easily lose high-frequency static details.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识

Comparison between existing methods and the heuristic guided segmentation (HuGS) proposed in this paper. When a static scene is disturbed by transient distractors, (a) segmentation-based methods rely on prior knowledge and will suffer from related artifacts due to the inability to identify unexpected transient objects (such as pizza); (b) heuristic-based methods The method is more general but not accurate enough (e.g. high-frequency static tablecloth texture is lost); (c) HuGS combines their advantages and is able to accurately separate transient distractors and static scene elements, thereby significantly improving the results of NeRF.

Overview of methods
The method based on the segmentation model is accurate but not universal, and the method based on the heuristic algorithm is universal but inaccurate. So, can they be combined to make up for each other's strengths and make up for it? Is it both accurate and universal?
Therefore, the author of the paper proposed
a novel paradigm
called Heuristics-Guided Segmentation (HuGS), motivated by "horses for courses". By cleverly combining hand-designed heuristics and cue-driven segmentation models, HuGS can accurately differentiate between transient distractors and static elements in a scene without additional prior knowledge.
Specifically, HuGS first uses a heuristic algorithm to roughly distinguish static transient elements in multi-view images and outputs rough cues, and then uses the rough cues to guide the segmentation model to generate more accurate segmentation masks. When training NeRF, these masks will be used to shield transient pixels and eliminate the impact of transient distractors on NeRF. HuGS design ideas.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识

In terms of specific implementation, the author of the paper
chose Segment Anything Model (SAM) as the segmentation model of HuGS. SAM is currently the most advanced prompt-driven segmentation model, which can accept different types of prompt inputs such as points, boxes, and masks and output corresponding instance segmentation masks.
As for the heuristic algorithm, the author proposed
a combined heuristic
after in-depth analysis: the heuristic based on Structure-from-Motion (SfM) is used to capture the high-frequency static details of the scene, while the heuristic based on A color residual heuristic is used to capture low-frequency static details. The rough static masks output by the two heuristics are different from each other, and their union is used to guide SAM to a more accurate static mask. By seamlessly combining these two heuristics, HuGS can robustly identify various types of static elements when faced with varying texture details.
HuGS flowchart. (a) Given an unordered multi-view image in a static scene with transient distractors, HuGS first obtains two heuristic information. (b) The SfM-based heuristic algorithm uses SfM to obtain the distinction between static feature points and transient feature points , and then uses sparse static feature points as hints to guide SAM Generate dense static masks. (c) Color residual-based heuristics rely on NeRF that is partially trained (i.e., trained with only thousands of iterations). The color residuals between its predicted and real images can be used to generate another set of static masks. (d) The combination of two different masks ultimately guides SAM to generate (e) an accurate static mask for each image.

SfM-based heuristic algorithm

SfM is a technology that reconstructs three-dimensional structures from two-dimensional images. After extracting the 2D features of the image, SfM performs matching and geometric verification on the features, and reconstructs a sparse 3D point cloud. SfM is often used to estimate image camera poses in NeRF, and the authors of the paper found that SfM can also be used to distinguish static and transient elements of the scene. Assuming that the number of matches for a certain two-dimensional feature point is the number of other two-dimensional feature points corresponding to the same three-dimensional point cloud point, then the number of matches for two-dimensional feature points from the static area is greater than the number of match points from the transient area.

Based on this finding, we can set a threshold on the number of matches to filter out static feature points, and then use SAM to convert the static feature points into static masks. In order to verify the correctness of this finding, the authors of the paper conducted statistics on the Kubric data set. As shown in the figure below, there are significant differences in the number of feature point matches in different image areas. Another visualization shows that reasonable threshold settings can remove transient feature points while retaining static feature points.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识

The left picture is a histogram of the number of matching numbers of feature points from different image areas. The matching number of static area feature points is evenly distributed in the [0,200] interval, while the transient area feature points The number of matches approaches 0 and is concentrated in the [0,10] interval. The picture on the right is a curve chart of the residual feature point density in different image areas after filtering as the threshold changes. The residual feature point density of the entire image and the static area decreases linearly as the threshold increases, while the residual feature point density of the transient area decreases linearly. Decreases exponentially and becomes almost 0 after a threshold greater than 0.2. CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识
Visualized distribution of remaining feature points of two images from different perspectives as the threshold increases. The remaining feature points located in the transient region are gradually removed, while most of the feature points in the static region are still retained.

Color Residual Based Heuristic

While the SfM based heuristic performs well in most scenes, it cannot capture static smooth textures well, this is because Smooth textures lack significant features and are difficult to be recognized by SfM's feature extraction algorithm.

In order to be able to identify low-frequency textures, the author of the paper introduced a heuristic algorithm based on color residuals: first partially train NeRF on the original multi-view images (that is, only iterate thousands of times), obtain an underfitting model, and then Get the color residual between the rendered image and the target image. As mentioned in the background introduction, the color residuals of low-frequency static texture areas are smaller than the residuals of other types of areas, so a threshold can be set on the color residuals to obtain a rough mask related to low-frequency static textures. The mask obtained by color residual can be complemented by the mask obtained by SfM to form a complete result.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识

A combination of two heuristic algorithms, where (a) is the input target image, and (d) is the NeRF rendering result of only five thousand iterations. The static mask (b) resulting from the SfM-based heuristic captures high-frequency static details (such as box texture) while missing static smooth parts (such as the white chair back). The static mask (e) derived from the color residual-based heuristic and its segmentation mask (f) derived from guided SAM alone achieve opposite effects. Their union (c) distinguishes transient distractors (i.e. pink balloons) while covering all static elements.

Experimental results

Visualization results

Here are shown the visual segmentation process of HuGS in two different real scenes, and the baseline model Mip-NeRF 360 when applying static mask Comparison of rendering results before and after film. With the help of combined heuristics and SAM, HuGS can generate accurate static masks, while Mip-NeRF 360 eliminates a large number of artifacts after applying static masks, and the rendering quality of RGB and depth maps is significantly improved.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识

Qualitative/quantitative rendering result comparison

Here are shown the experimental results of the paper method on three data sets and two baseline models, as well as the comparison with existing methods. Existing methods either fail to eliminate artifacts caused by transient distractors or erase too much static texture detail. In contrast, our method can better preserve static details while effectively eliminating artifacts.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识

Comparison of qualitative/quantitative segmentation results

The author of the paper also compared it with existing segmentation algorithms on the Kubric dataset. Experimental results show that even if prior knowledge is provided, existing segmentation models such as semantic segmentation and video segmentation still perform poorly because none of the existing segmentation models are designed for this task. Although existing heuristic-based methods can roughly locate the location of transient interferers, they cannot obtain more precise segmentation results. In contrast, HuGS accurately separates transient distractors and static scene elements without additional prior knowledge by combining heuristic algorithms and segmentation models.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识

Ablation experiment results

The author of the paper also verified the impact of each component on HuGS by removing different components. The results show that the model (b) lacking the SfM-based heuristic does not reconstruct the low-frequency static texture in the blue box well, while the models (c) and (d) lacking the color residual-based heuristic lose the yellow color High frequency static details in the box. In comparison, the full method (f) gives the best numerical metrics and visualization results.

CVPR最佳论文候选 | NeRF新突破,用启发式引导分割去除瞬态干扰物,无需额外先验知识

Full text summary

The paper proposes a novel heuristic guided segmentation paradigm, which effectively solves the common transient interference problem in NeRF real-life training. By strategically combining the complementary strengths of hand-designed heuristics and state-of-the-art segmentation models, the method achieves highly accurate segmentation of transient distractors in diverse scenes without any prior knowledge. Through carefully designed heuristics, our method is able to robustly capture high- and low-frequency static scene elements. A large number of experiments have proved the advancement of this method.

The above is the detailed content of CVPR best paper candidate | New breakthrough in NeRF, using heuristic-guided segmentation to remove transient interference without additional prior knowledge. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
4090生成器:与A100平台相比,token生成速度仅低于18%,上交推理引擎赢得热议4090生成器:与A100平台相比,token生成速度仅低于18%,上交推理引擎赢得热议Dec 21, 2023 pm 03:25 PM

PowerInfer提高了在消费级硬件上运行AI的效率上海交大团队最新推出了超强CPU/GPULLM高速推理引擎PowerInfer。PowerInfer和llama.cpp都在相同的硬件上运行,并充分利用了RTX4090上的VRAM。这个推理引擎速度有多快?在单个NVIDIARTX4090GPU上运行LLM,PowerInfer的平均token生成速率为13.20tokens/s,峰值为29.08tokens/s,仅比顶级服务器A100GPU低18%,可适用于各种LLM。PowerInfer与

思维链CoT进化成思维图GoT,比思维树更优秀的提示工程技术诞生了思维链CoT进化成思维图GoT,比思维树更优秀的提示工程技术诞生了Sep 05, 2023 pm 05:53 PM

要让大型语言模型(LLM)充分发挥其能力,有效的prompt设计方案是必不可少的,为此甚至出现了promptengineering(提示工程)这一新兴领域。在各种prompt设计方案中,思维链(CoT)凭借其强大的推理能力吸引了许多研究者和用户的眼球,基于其改进的CoT-SC以及更进一步的思维树(ToT)也收获了大量关注。近日,苏黎世联邦理工学院、Cledar和华沙理工大学的一个研究团队提出了更进一步的想法:思维图(GoT)。让思维从链到树到图,为LLM构建推理过程的能力不断得到提升,研究者也通

复旦NLP团队发布80页大模型Agent综述,一文纵览AI智能体的现状与未来复旦NLP团队发布80页大模型Agent综述,一文纵览AI智能体的现状与未来Sep 23, 2023 am 09:01 AM

近期,复旦大学自然语言处理团队(FudanNLP)推出LLM-basedAgents综述论文,全文长达86页,共有600余篇参考文献!作者们从AIAgent的历史出发,全面梳理了基于大型语言模型的智能代理现状,包括:LLM-basedAgent的背景、构成、应用场景、以及备受关注的代理社会。同时,作者们探讨了Agent相关的前瞻开放问题,对于相关领域的未来发展趋势具有重要价值。论文链接:https://arxiv.org/pdf/2309.07864.pdfLLM-basedAgent论文列表:

吞吐量提升5倍,联合设计后端系统和前端语言的LLM接口来了吞吐量提升5倍,联合设计后端系统和前端语言的LLM接口来了Mar 01, 2024 pm 10:55 PM

大型语言模型(LLM)被广泛应用于需要多个链式生成调用、高级提示技术、控制流以及与外部环境交互的复杂任务。尽管如此,目前用于编程和执行这些应用程序的高效系统却存在明显的不足之处。研究人员最近提出了一种新的结构化生成语言(StructuredGenerationLanguage),称为SGLang,旨在改进与LLM的交互性。通过整合后端运行时系统和前端语言的设计,SGLang使得LLM的性能更高、更易控制。这项研究也获得了机器学习领域的知名学者、CMU助理教授陈天奇的转发。总的来说,SGLang的

大模型也有小偷?为保护你的参数,上交大给大模型制作「人类可读指纹」大模型也有小偷?为保护你的参数,上交大给大模型制作「人类可读指纹」Feb 02, 2024 pm 09:33 PM

将不同的基模型象征为不同品种的狗,其中相同的「狗形指纹」表明它们源自同一个基模型。大模型的预训练需要耗费大量的计算资源和数据,因此预训练模型的参数成为各大机构重点保护的核心竞争力和资产。然而,与传统软件知识产权保护不同,对预训练模型参数盗用的判断存在以下两个新问题:1)预训练模型的参数,尤其是千亿级别模型的参数,通常不会开源。预训练模型的输出和参数会受到后续处理步骤(如SFT、RLHF、continuepretraining等)的影响,这使得判断一个模型是否基于另一个现有模型微调得来变得困难。无

FATE 2.0发布:实现异构联邦学习系统互联FATE 2.0发布:实现异构联邦学习系统互联Jan 16, 2024 am 11:48 AM

FATE2.0全面升级,推动隐私计算联邦学习规模化应用FATE开源平台宣布发布FATE2.0版本,作为全球领先的联邦学习工业级开源框架。此次更新实现了联邦异构系统之间的互联互通,持续增强了隐私计算平台的互联互通能力。这一进展进一步推动了联邦学习与隐私计算规模化应用的发展。FATE2.0以全面互通为设计理念,采用开源方式对应用层、调度、通信、异构计算(算法)四个层面进行改造,实现了系统与系统、系统与算法、算法与算法之间异构互通的能力。FATE2.0的设计兼容了北京金融科技产业联盟的《金融业隐私计算

220亿晶体管,IBM机器学习专用处理器NorthPole,能效25倍提升220亿晶体管,IBM机器学习专用处理器NorthPole,能效25倍提升Oct 23, 2023 pm 03:13 PM

IBM再度发力。随着AI系统的飞速发展,其能源需求也在不断增加。训练新系统需要大量的数据集和处理器时间,因此能耗极高。在某些情况下,执行一些训练好的系统,智能手机就能轻松胜任。但是,执行的次数太多,能耗也会增加。幸运的是,有很多方法可以降低后者的能耗。IBM和英特尔已经试验过模仿实际神经元行为设计的处理器。IBM还测试了在相变存储器中执行神经网络计算,以避免重复访问RAM。现在,IBM又推出了另一种方法。该公司的新型NorthPole处理器综合了上述方法的一些理念,并将其与一种非常精简的计算运行

制作莫比乌斯环,最少需要多长纸带?50年来的谜题被解开了制作莫比乌斯环,最少需要多长纸带?50年来的谜题被解开了Oct 07, 2023 pm 06:17 PM

自己动手做过莫比乌斯带吗?莫比乌斯带是一种奇特的数学结构。要构造一个这样美丽的单面曲面其实非常简单,即使是小孩子也可以轻松完成。你只需要取一张纸带,扭曲一次,然后将两端粘在一起。然而,这样容易制作的莫比乌斯带却有着复杂的性质,长期吸引着数学家们的兴趣。最近,研究人员一直被一个看似简单的问题困扰着,那就是关于制作莫比乌斯带所需纸带的最短长度?布朗大学RichardEvanSchwartz谈到,对于莫比乌斯带来说,这个问题没有解决,因为它们是「嵌入的」而不是「浸入的」,这意味着它们不会相互渗透或自我

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools