search
HomeTechnology peripheralsAIUniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!

Original title: UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric and Semantic Rendering

Please click the following link to view the paper: https://arxiv.org/pdf/2306.09117.pdf

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!

Paper idea:

In this technical report, we propose a solution called UniOCC for use in nuScenes at CVPR 2023 Vision-centered 3D occupancy prediction trajectories are performed in the Open Dataset Challenge. Existing occupancy prediction methods mainly focus on using 3D occupancy labels to optimize the projected characteristics of the 3D volumetric space. However, the generation process of these labels is very complex and expensive (relying on 3D semantic annotation), and is limited by voxel resolution and cannot provide fine-grained spatial semantics. To address this limitation, we propose a new unified occupancy (UniOcc) prediction method that explicitly imposes spatial geometric constraints and supplements fine-grained semantic supervision with volume ray rendering. Our method significantly improves model performance and shows good potential in reducing manual annotation costs. Considering the laboriousness of annotating 3D occupancies, we further propose the depth-aware Teacher Student (DTS) framework to improve the prediction accuracy using unlabeled data. Our solution achieved 51.27% mIoU on the official single-model ranking, ranking third in this challenge

Network Design:

Here As part of this challenge, this paper proposes UniOcc, a general solution that leverages volume rendering to unify 2D and 3D representation supervision, improving multi-camera occupancy prediction models. This paper does not design a new model architecture, but focuses on enhancing existing models [3, 18, 20] in a versatile and plug-and-play manner.

Re-written as follows: This paper implements the function of generating 2D semantic and depth maps using volume rendering by upgrading the representation to NeRF-style representation [1,15,21]. This enables fine-grained supervision at the 2D pixel level. By ray sampling three-dimensional voxels, the rendered two-dimensional pixel semantics and depth information can be obtained. By explicitly integrating geometric occlusion relationships and semantic consistency constraints, this paper provides explicit guidance for the model and ensures compliance with these constraints. It is worth mentioning that UniOcc has the potential to reduce the need for expensive 3D semantic annotation. dependence. In the absence of 3D occupancy labels, models trained using only our volume rendering supervision perform even better than models trained using 3D label supervision. This highlights the exciting potential to reduce reliance on expensive 3D semantic annotations, as scene representations can be learned directly from affordable 2D segmentation labels. In addition, using advanced technologies such as SAM [6] and [14,19] can further reduce the cost of 2D segmentation annotation.

This article also introduces the Deep Sensing Teacher-Student (DTS) framework, a self-supervised training method. Unlike the classic Mean Teacher, DTS enhances the deep prediction of the teacher model, achieving stable and effective training while utilizing unlabeled data. Furthermore, this paper applies some simple yet effective techniques to improve the performance of the model. This includes using visible masks in training, using a stronger pre-trained backbone network, increasing voxel resolution, and implementing test-time data augmentation (TTA)

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering! following Here is an overview of the UniOcc framework: Figure 1

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!Figure 2. Depth-aware Teacher-Student framework.

Experimental results:

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!

UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!#Quote:

Pan, M., Liu, L., Liu, J., Huang, P., Wang, L., Zhang, S., Xu, S., Lai, Z., Yang, K. (2023) . UniOcc: Unifying geometric and semantic rendering with vision-centric 3D occupancy prediction. ArXiv. / abs / 2306.09117

Original link: https://mp.weixin.qq.com/s/iLPHMtLzc5z0f4bg_W1vIgUniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!

The above is the detailed content of UniOcc: Unifying vision-centric occupancy prediction with geometric and semantic rendering!. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Windows 11 上的智能应用控制:如何打开或关闭它Windows 11 上的智能应用控制:如何打开或关闭它Jun 06, 2023 pm 11:10 PM

智能应用控制是Windows11中非常有用的工具,可帮助保护你的电脑免受可能损害数据的未经授权的应用(如勒索软件或间谍软件)的侵害。本文将解释什么是智能应用控制、它是如何工作的,以及如何在Windows11中打开或关闭它。什么是Windows11中的智能应用控制?智能应用控制(SAC)是Windows1122H2更新中引入的一项新安全功能。它与MicrosoftDefender或第三方防病毒软件一起运行,以阻止可能不必要的应用,这些应用可能会减慢设备速度、显示意外广告或执行其他意外操作。智能应用

一文聊聊SLAM技术在自动驾驶的应用一文聊聊SLAM技术在自动驾驶的应用Apr 09, 2023 pm 01:11 PM

定位在自动驾驶中占据着不可替代的地位,而且未来有着可期的发展。目前自动驾驶中的定位都是依赖RTK配合高精地图,这给自动驾驶的落地增加了不少成本与难度。试想一下人类开车,并非需要知道自己的全局高精定位及周围的详细环境,有一条全局导航路径并配合车辆在该路径上的位置,也就足够了,而这里牵涉到的,便是SLAM领域的关键技术。什么是SLAMSLAM (Simultaneous Localization and Mapping),也称为CML (Concurrent Mapping and Localiza

一文读懂智能汽车滑板底盘一文读懂智能汽车滑板底盘May 24, 2023 pm 12:01 PM

01什么是滑板底盘所谓滑板式底盘,即将电池、电动传动系统、悬架、刹车等部件提前整合在底盘上,实现车身和底盘的分离,设计解耦。基于这类平台,车企可以大幅降低前期研发和测试成本,同时快速响应市场需求打造不同的车型。尤其是无人驾驶时代,车内的布局不再是以驾驶为中心,而是会注重空间属性,有了滑板式底盘,可以为上部车舱的开发提供更多的可能。如上图,当然我们看滑板底盘,不要上来就被「噢,就是非承载车身啊」的第一印象框住。当年没有电动车,所以没有几百公斤的电池包,没有能取消转向柱的线传转向系统,没有线传制动系

智能网联汽车线控底盘技术深度解析智能网联汽车线控底盘技术深度解析May 02, 2023 am 11:28 AM

01线控技术认知线控技术(XbyWire),是将驾驶员的操作动作经过传感器转变成电信号来实现传递控制,替代传统机械系统或者液压系统,并由电信号直接控制执行机构以实现控制目的,基本原理如图1所示。该技术源于美国国家航空航天局(NationalAeronauticsandSpaceAdministration,NASA)1972年推出的线控飞行技术(FlybyWire)的飞机。其中,“X”就像数学方程中的未知数,代表汽车中传统上由机械或液压控制的各个部件及相关的操作。图1线控技术的基本原理

智能汽车规划控制常用控制方法详解智能汽车规划控制常用控制方法详解Apr 11, 2023 pm 11:16 PM

控制是驱使车辆前行的策略。控制的目标是使用可行的控制量,最大限度地降低与目标轨迹的偏差、最大限度地提供乘客的舒适度等。如上图所示,与控制模块输入相关联的模块有规划模块、定位模块和车辆信息等。其中定位模块提供车辆的位置信息,规划模块提供目标轨迹信息,车辆信息则包括档位、速度、加速度等。控制输出量则为转向、加速和制动量。控制模块主要分为横向控制和纵向控制,根据耦合形式的不同可以分为独立和一体化两种方法。1 控制方法1.1 解耦控制所谓解耦控制,就是将横向和纵向控制方法独立分开进行控制。1.2 耦合控

一文读懂智能汽车驾驶员监控系统一文读懂智能汽车驾驶员监控系统Apr 11, 2023 pm 08:07 PM

驾驶员监控系统,缩写DMS,是英文Driver Monitor System的缩写,即驾驶员监控系统。主要是实现对驾驶员的身份识别、驾驶员疲劳驾驶以及危险行为的检测功能。福特DMS系统01 法规加持,DMS进入发展快车道在现阶段开始量产的L2-L3级自动驾驶中,其实都只有在特定条件下才可以实行,很多状况下需要驾驶员能及时接管车辆进行处置。因此,在驾驶员太信任自动驾驶而放弃或减弱对驾驶过程的掌控时可能会导致某些事故的发生。而DMS-驾驶员监控系统的引入可以有效减轻这一问题的出现。麦格纳DMS系统,

李飞飞两位高徒联合指导:能看懂「多模态提示」的机器人,zero-shot性能提升2.9倍李飞飞两位高徒联合指导:能看懂「多模态提示」的机器人,zero-shot性能提升2.9倍Apr 12, 2023 pm 08:37 PM

人工智能领域的下一个发展机会,有可能是给AI模型装上一个「身体」,与真实世界进行互动来学习。相比现有的自然语言处理、计算机视觉等在特定环境下执行的任务来说,开放领域的机器人技术显然更难。比如prompt-based学习可以让单个语言模型执行任意的自然语言处理任务,比如写代码、做文摘、问答,只需要修改prompt即可。但机器人技术中的任务规范种类更多,比如模仿单样本演示、遵照语言指示或者实现某一视觉目标,这些通常都被视为不同的任务,由专门训练后的模型来处理。最近来自英伟达、斯坦福大学、玛卡莱斯特学

AutoGPT star量破10万,这是首篇系统介绍自主智能体的文章AutoGPT star量破10万,这是首篇系统介绍自主智能体的文章Apr 28, 2023 pm 04:10 PM

在GitHub上,AutoGPT的star量已经破10万。这是一种新型人机交互方式:你不用告诉AI先做什么,再做什么,而是给它制定一个目标就好,哪怕像「创造世界上最好的冰淇淋」这样简单。类似的项目还有BabyAGI等等。这股自主智能体浪潮意味着什么?它们是怎么运行的?它们在未来会是什么样子?现阶段如何尝试这项新技术?在这篇文章中,OctaneAI首席执行官、联合创始人MattSchlicht进行了详细介绍。人工智能可以用来完成非常具体的任务,比如推荐内容、撰写文案、回答问题,甚至生成与现实生活无

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version