Apple develops 'AI architect' GAUDI: generates ultra-realistic 3D scenes based on text!-AI-php.cn

Home

Technology peripherals

Apple develops 'AI architect' GAUDI: generates ultra-realistic 3D scenes based on text!

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 02, 2023 pm 03:46 PM

aiarchitect

Nowadays, new text-generated image models are released every once in a while, and each of them has very powerful effects. They always amaze everyone. This field has already reached the sky. However, AI systems such as OpenAI's DALL-E 2 or Google's Imagen can only generate two-dimensional images. If text can also be turned into a three-dimensional scene, the visual experience will be doubled. Now, the AI team from Apple has launched the latest neural architecture for 3D scene generation - GAUDI.

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

It can capture complex and realistic 3D scene distribution, immersive rendering from a moving camera, and also based on text prompts. Create 3D scenes! The model is named after Antoni Gaudi, a famous Spanish architect.

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

##Paper address: https://arxiv.org/pdf/2207.13751.pdf

1 3D rendering based on NeRFs

Neural rendering combines computer graphics with artificial intelligence and has produced many methods of generating 3D models from 2D images. system. For example, the recently developed 3D MoMa by Nvidia can create a 3D model from less than 100 photos in an hour. Google also relies on Neural Radiation Fields (NeRFs) to combine 2D satellite and Street View images into 3D scenes in Google Maps to achieve immersive views. Google’s HumanNeRF can also render 3D human bodies from videos.

Currently, NeRFs are mainly used as a neural storage medium for 3D models and 3D scenes, which can be rendered from different camera perspectives. NeRFs are also already starting to be used in virtual reality experiences.

So, can NeRFs, with its powerful ability to realistically render images from different camera angles, be used in generative AI? Of course, there are research teams that have tried to generate 3D scenes. For example, Google launched the AI system Dream Fields for the first time last year. It combines NeRF's ability to generate 3D views with OpenAI's CLIP's ability to evaluate image content, and finally achieves the ability to Generate NeRF matching text description.

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

##Caption: Google Dream Fields

However, Google’s Dream Fields can only generate 3D views of a single object, and there are many difficulties in extending it to completely unconstrained 3D scenes. The biggest difficulty is that there are great restrictions on the position of the camera. For a single object, every possible and reasonable camera position can be mapped to a dome, but in a 3D scene, the position of the camera will be affected by objects and walls, etc. Obstacle limitations. If these factors are not considered during scene generation, it will be difficult to generate a 3D scene.

3D rendering expert GAUDI

For the above-mentioned problem of limited camera position, Apple's GAUDI model has come up with three specialized networks To make it easy: GAUDI has a

camera pose decoder, which separates the camera pose from the 3D geometry and appearance of the scene, can predict the possible position of the camera, and ensure that the output is a valid position of the 3D scene architecture .

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

Note: Decoder model architecture Scene decoder for scenariosYou can predict the representation of a three-dimensional plane, which is a 3D canvas.

Then,

Radiation Field Decoder will use the volume rendering equation on this canvas to draw subsequent images.

GAUDI’s 3D generation consists of two stages:

One is the optimization of latent and network parameters: learning latent representations that encode the 3D radiation fields and corresponding camera poses of thousands of trajectories. Unlike for a single object, the effective camera pose varies with the scene, so it is necessary to encode the valid camera pose for each scene.

The second is to use the diffusion model to learn a generative model on the latent representation, so that it can model well in both conditional and unconditional reasoning tasks. The former generates 3D scenes based on text or image prompts, while the latter generates 3D scenes based on camera trajectories.

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

With 3D indoor scenes, GAUDI can generate new camera movements. As in some of the examples below, the text description contains information about the scene and the navigation path. Here the research team adopted a pre-trained RoBERTa-based text encoder and used its intermediate representation to adjust the diffusion model. The generated effect is as follows: Text prompt: Enter the kitchen

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

Text prompt: Go upstairs

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

Text prompt: Go through the corridor

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

In addition, using pre-trained ResNet-18 as the image encoder, GAUDI is able to sample the radiation field of a given image observed from random viewpoints, thereby extracting from the image cues Create 3D scenes. Image prompt:

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

Generate 3D scene:

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

Image Tips:

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

Generate 3D scene:

苹果开发「AI 建筑师」GAUDI：根据文本生成超逼真 3D 场景！

Researcher Experiments on four different datasets, including the indoor scanning dataset ARKitScences, show that GAUDI can reconstruct learned views and match the quality of existing methods. Even in the huge task of producing 3D scenes with hundreds of thousands of images for thousands of indoor scenes, GAUDI did not suffer from mode collapse or orientation problems.

The emergence of GAUDI will not only have an impact on many computer vision tasks, but its 3D scene generation capabilities will also be beneficial to model-based reinforcement learning and planning, SLAM and 3D content. Production and other research fields.

At present, the quality of the video generated by GAUDI is not high, and many artifacts can be seen. However, this system may be a good start and foundation for Apple's ongoing AI system for rendering 3D objects and scenes. It is said that GAUDI will also be applied to Apple's XR headsets for generating digital positions. You can look forward to it~

The above is the detailed content of Apple develops 'AI architect' GAUDI: generates ultra-realistic 3D scenes based on text!. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

ai合并图层的快捷键是什么Jan 07, 2021 am 10:59 AM

ai合并图层的快捷键是“Ctrl+Shift+E”，它的作用是把目前所有处在显示状态的图层合并，在隐藏状态的图层则不作变动。也可以选中要合并的图层，在菜单栏中依次点击“窗口”-“路径查找器”，点击“合并”按钮。

ai橡皮擦擦不掉东西怎么办Jan 13, 2021 am 10:23 AM

ai橡皮擦擦不掉东西是因为AI是矢量图软件，用橡皮擦不能擦位图的，其解决办法就是用蒙板工具以及钢笔勾好路径再建立蒙板即可实现擦掉东西。

谷歌超强AI超算碾压英伟达A100！TPU v4性能提升10倍，细节首次公开Apr 07, 2023 pm 02:54 PM

虽然谷歌早在2020年，就在自家的数据中心上部署了当时最强的AI芯片——TPU v4。但直到今年的4月4日，谷歌才首次公布了这台AI超算的技术细节。论文地址：https://arxiv.org/abs/2304.01433相比于TPU v3，TPU v4的性能要高出2.1倍，而在整合4096个芯片之后，超算的性能更是提升了10倍。另外，谷歌还声称，自家芯片要比英伟达A100更快、更节能。与A100对打，速度快1.7倍论文中，谷歌表示，对于规模相当的系统，TPU v4可以提供比英伟达A100强1.

ai可以转成psd格式吗Feb 22, 2023 pm 05:56 PM

ai可以转成psd格式。转换方法：1、打开Adobe Illustrator软件，依次点击顶部菜单栏的“文件”-“打开”，选择所需的ai文件；2、点击右侧功能面板中的“图层”，点击三杠图标，在弹出的选项中选择“释放到图层（顺序）”；3、依次点击顶部菜单栏的“文件”-“导出”-“导出为”；4、在弹出的“导出”对话框中，将“保存类型”设置为“PSD格式”，点击“导出”即可；

GPT-4的研究路径没有前途？Yann LeCun给自回归判了死刑Apr 04, 2023 am 11:55 AM

Yann LeCun 这个观点的确有些大胆。「从现在起 5 年内，没有哪个头脑正常的人会使用自回归模型。」最近，图灵奖得主 Yann LeCun 给一场辩论做了个特别的开场。而他口中的自回归，正是当前爆红的 GPT 家族模型所依赖的学习范式。当然，被 Yann LeCun 指出问题的不只是自回归模型。在他看来，当前整个的机器学习领域都面临巨大挑战。这场辩论的主题为「Do large language models need sensory grounding for meaning and u

ai顶部属性栏不见了怎么办Feb 22, 2023 pm 05:27 PM

ai顶部属性栏不见了的解决办法：1、开启Ai新建画布，进入绘图页面；2、在Ai顶部菜单栏中点击“窗口”；3、在系统弹出的窗口菜单页面中点击“控制”，然后开启“控制”窗口即可显示出属性栏。

ai移动不了东西了怎么办Mar 07, 2023 am 10:03 AM

ai移动不了东西的解决办法：1、打开ai软件，打开空白文档；2、选择矩形工具，在文档中绘制矩形；3、点击选择工具，移动文档中的矩形；4、点击图层按钮，弹出图层面板对话框，解锁图层；5、点击选择工具，移动矩形即可。

强化学习再登Nature封面，自动驾驶安全验证新范式大幅减少测试里程Mar 31, 2023 pm 10:38 PM

引入密集强化学习，用 AI 验证 AI。自动驾驶汽车 (AV) 技术的快速发展，使得我们正处于交通革命的风口浪尖，其规模是自一个世纪前汽车问世以来从未见过的。自动驾驶技术具有显着提高交通安全性、机动性和可持续性的潜力，因此引起了工业界、政府机构、专业组织和学术机构的共同关注。过去 20 年里，自动驾驶汽车的发展取得了长足的进步，尤其是随着深度学习的出现更是如此。到 2015 年，开始有公司宣布他们将在 2020 之前量产 AV。不过到目前为止，并且没有 level 4 级别的 AV 可以在市场

See all articles