


The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com.
Efficient and high-quality reconstruction of dynamic three-dimensional physical phenomena such as smoke is an important issue in related scientific research. It has broad application prospects in aerodynamic design verification, meteorological three-dimensional observation and other fields. . By collectively reconstructing three-dimensional density sequences that change over time, scientists can better understand and verify various complex physical phenomena in the real world.
Figure 1 shows the importance of observing dynamic three-dimensional physical phenomena for scientific research. The picture shows the world's largest wind tunnel NFAC conducting aerodynamic experiments on commercial truck entities.
However, it is quite difficult to quickly acquire and reconstruct dynamic three-dimensional density fields with high quality in the real world. First, three-dimensional information is difficult to measure directly through common two-dimensional image sensors (such as cameras). In addition, high-speed changing dynamic phenomena place high demands on physical acquisition capabilities: a complete sampling of a single three-dimensional density field needs to be intercepted in a very short time, otherwise the three-dimensional density field itself will change. The fundamental challenge here is how to resolve the information gap between the measurement sample itself and the dynamic three-dimensional density field reconstruction results.
Current mainstream research work uses prior knowledge to make up for the lack of information in measurement samples. The calculation cost is high, and the reconstruction quality is poor when the prior conditions are not met. Different from the mainstream research ideas, the research team of the National Key Laboratory of Computer-Aided Design and Graphics Systems of Zhejiang University believes that the key to solving the problem lies in increasing the information content of the unit measurement sample.
The research team not only uses AI to optimize the reconstruction algorithm, but also uses AI to help design physical collection methods to achieve fully automatic software and hardware joint optimization driven by the same goal, essentially improving the amount of information about the target object in the unit measurement sample. . By simulating physical optical phenomena in the real world, artificial intelligence can decide how to project structured light, how to collect corresponding images, and how to reconstruct a dynamic three-dimensional density field from the sample book. In the end, the research team only used a lightweight hardware prototype containing a single projector and a small number of cameras (1 or 3) to reduce the number of structured light patterns to model a single three-dimensional density field (spatial resolution 128x128x128) to 6, achieving Efficient acquisition set of 40 three-dimensional density fields per second.
The team innovatively proposed a lightweight one-dimensional decoder in the reconstruction algorithm, using local input light as part of the decoder input, and shared decoder parameters under different materials captured by different cameras, significantly improving Reduce network complexity and increase calculation speed. In order to fuse the decoding results of different cameras, a 3D U-Net fusion network with a simple structure is designed. The final reconstruction of a single three-dimensional density field only takes 9.2 milliseconds. Compared with SOTA research work, the reconstruction speed is increased by 2-3 orders of magnitude, achieving real-time high-quality reconstruction of the three-dimensional density field. The related research paper "Real-time Acquisition and Reconstruction of Dynamic Volumes with Neural Structured Illumination" has been accepted by CVPR 2024, the top international academic conference in the field of computer vision.
Paper link: https://svbrdf.github.io/publications/realtimedynamic/realtimedynamic.pdf
-
Research homepage: https://svbrdf.github.io/publications/realtimedynamic/project.html
Related work can be divided into the following two categories according to whether the lighting is controlled during the collection process.
The first type of work based on non-controllable lighting does not require a special light source and does not control the lighting during the collection process, so the requirements for collection conditions are looser [2,3]. Since a single-view camera captures a two-dimensional projection of a three-dimensional structure, it is difficult to distinguish different three-dimensional structures with high quality. In this regard, one idea is to increase the number of collected viewing angle samples, such as using dense camera arrays or light field cameras, which will lead to high hardware costs. Another idea is still to sparsely sample the perspective domain and fill the information gap through various types of prior information, such as heuristic priors, physical rules or prior knowledge learned from existing data. Once the a priori conditions are not met in practice, the quality of the reconstruction results of this type of method will deteriorate. Furthermore, its computational overhead is too expensive to support real-time reconstruction.
The second type of work uses controllable lighting to actively control lighting conditions during the collection process [4,5]. Such work encodes lighting to more actively probe the physical world and also relies less on priors, resulting in higher reconstruction quality. Depending on whether a single lamp or multiple lamps are used simultaneously, related work can be further classified into scanning methods and illumination multiplexing methods. For dynamic physical objects, the former must achieve high scanning speeds by using expensive hardware, or sacrifice the integrity of the results to reduce the acquisition burden. The latter significantly improves collection efficiency by programming multiple light sources simultaneously. However, for high-quality fast real-time density fields, the sampling efficiency of existing methods is still insufficient [5].
The work of the Zhejiang University team falls into the second category. Different from most existing work, this research work uses artificial intelligence to jointly optimize physical acquisition (i.e., neural structured light) and computational reconstruction, thereby achieving efficient and high-quality dynamic three-dimensional density field modeling.
Hardware prototype
The research team built a single commercial projector (BenQ X3000: resolution 1920×1080, speed 240fps) and three industrial cameras (Basler acA1440- 220umQGR: a simple hardware prototype composed of 1440×1080 resolution and 240fps speed (as shown in Figure 3). Six pre-trained structured light patterns are cyclically projected through the projector, and the three cameras shoot simultaneously, and dynamic three-dimensional density field reconstruction is performed based on the images collected by the cameras. The angles of the four devices relative to the collection object are the optimal arrangements selected after simulations from different simulation experiments.
Figure 3: Acquisition hardware prototype. (a) Real shot of the hardware prototype, with three white tags on the stage used to synchronize the camera and projector. (b) Schematic diagram of the geometric relationship between camera, projector and subject (top view).
Software processing
The R&D team designs a deep neural network composed of encoders, decoders and aggregation modules. The weights in its encoder directly correspond to the structured light intensity distribution during acquisition. The decoder takes a sample measured on a single pixel as input, predicts a one-dimensional density distribution and interpolates it into a three-dimensional density field. The aggregation module combines the multiple three-dimensional density fields predicted by the decoder corresponding to each camera into the final result. By using trainable structured light and a lightweight one-dimensional decoder, this study can more easily learn the essential relationship between structured light patterns, two-dimensional photos and three-dimensional density fields, making it less likely to overfit to the training data. middle. Figure 4 below shows the overall pipeline, and Figure 5 shows the relevant network structure.
Figure 4: Global acquisition and reconstruction pipeline (a), and from structured light pattern to one-dimensional local incident light (b) and from predicted The resampling process of the one-dimensional density distribution back to the three-dimensional density field (c). The study starts with a simulated/real three-dimensional density field, onto which pre-optimized structured light patterns (i.e. weights in the encoder) are first projected. For each valid pixel in each camera view, all its measurements and the resampled local incident light are fed to the decoder to predict the one-dimensional density distribution on the corresponding camera ray. All density distributions from one camera are then collected and resampled into a single three-dimensional density field. In the multi-camera case, this study fuses the predicted density fields of each camera to obtain the final result.
# Figure 5: Architecture of the 3 main components of the network: encoder, decoder and aggregation module.
Result display
Figure 6 shows the partial reconstruction results of four different dynamic scenes using this method. To generate dynamic water mist, the researchers added dry ice to bottles containing liquid water to create water mist, and controlled the flow through valves and used rubber tubes to guide it further to the collection device.# Figure 6: Reconstruction results of different dynamic scenes. Each row is the visualization result of a selected part of the reconstructed frame in a certain water mist sequence. The number of water mist sources in the scene from top to bottom is: 1, 1, 3 and 2 respectively. As shown in the orange mark on the upper left, A, B, and C respectively correspond to the images collected by the three input cameras, and D is a real-shot reference image similar to the reconstruction result rendering perspective. The timestamp is displayed in the lower left corner. For detailed dynamic reconstruction results, please see the paper video.
In order to verify the correctness and quality of this research, the research team compared this method with related SOTA methods on real static objects (as shown in Figure 7). Figure 7 also compares the reconstruction quality under different camera numbers. All reconstruction results are plotted under the same new unacquired perspective and quantitatively evaluated by three evaluation metrics. As can be seen from Figure 7, thanks to the optimization of acquisition efficiency, the reconstruction quality of this method is better than the SOTA method.
Figure 7: Comparison of different techniques on real static objects. From left to right are the optical layer cutting method [4], this method (three cameras), this method (double camera), this method (single camera), using hand-designed structured light under a single camera [5], SOTA's PINF Visualization of reconstruction results of [3] and GlobalTrans [2] methods. Taking the optical slice results as a benchmark, and for all other results, their quantitative errors are listed in the lower right corner of the corresponding images, evaluated with three metrics SSIM/PSNR/RMSE (×0.01). All reconstructed density fields are rendered using non-input views, #v represents the number of views acquired and #p represents the number of structured light patterns used.
The research team also quantitatively compared the reconstruction quality of different methods on dynamic simulation data. Figure 8 shows the reconstruction quality comparison of simulated smoke sequences. For detailed frame-by-frame reconstruction results, please see the paper video.
Figure 8: Comparison of different methods on simulated smoke sequences. From left to right are the real values, reconstruction results of this method, PINF [3] and GlobalTrans [2]. The rendering results of the input view and new view are shown in the first and second rows respectively. The quantitative error SSIM/PSNR/RMSE (×0.01) is shown in the lower right corner of the corresponding image. For the average error of the entire reconstructed sequence, please refer to the supplementary material of the paper. In addition, please see the paper video for the dynamic reconstruction results of the entire sequence.
Future Outlook
The research team plans to apply this method on more advanced acquisition equipment (such as light field projectors [6]) Dynamic acquisition and reconstruction. The team also hopes to further reduce the number of structured light patterns and cameras required for collection by collecting richer optical information (such as polarization state). In addition, combining this method with neural expressions (such as NeRF) is also one of the future development directions that the team is interested in. Finally, allowing AI to more actively participate in the design of physical acquisition and computational reconstruction, and not be limited to post-processing software, may provide new ideas for further improving physical perception capabilities, and ultimately achieve efficient and high-quality modeling of different complex physical phenomena. .
Reference:
[1]. Inside the World's Largest Wind Tunnel https://youtu.be /ubyxYHFv2qw?si=KK994cXtARP3Atwn
[2]. Erik Franz, Barbara Solenthaler, and Nils Thuerey. Global transport for fluid reconstruction with learned selfsupervision. In CVPR, pages 1632–1642, 2021.
[3]. Mengyu Chu, Lingjie Liu, Quan Zheng, Erik Franz, HansPeter Seidel, Christian Theobalt, and Rhaleb Zayer . Physics informed neural fields for smoke reconstruction with sparse data. ACM Transactions on Graphics, 41 (4):1–14, 2022.
[4]. Tim Hawkins, Per Einarsson, and Paul Debevec. Acquisition of time-varying participating media. ACM Transactions on Graphics, 24 (3):812–815, 2005.
[5]. Jinwei Gu, Shree K. Nayar, Eitan Grinspun, Peter N. Belhumeur, and Ravi Ramamoorthi. Compressive structured light for recovering inhomogeneous participating media. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (3):1 –1, 2013.
[6]. Xianmin Xu, Yuxin Lin, Haoyang Zhou, Chong Zeng, Yaxin Yu, Kun Zhou, and Hongzhi Wu. A Unified spatial-angular structured light for single-view acquisition of shape and reflectance. In CVPR, pages 206–215, 2023.
The above is the detailed content of CVPR 2024 | With the help of neural structured light, Zhejiang University realizes real-time acquisition and reconstruction of dynamic three-dimensional phenomena. For more information, please follow other related articles on the PHP Chinese website!

PowerInfer提高了在消费级硬件上运行AI的效率上海交大团队最新推出了超强CPU/GPULLM高速推理引擎PowerInfer。PowerInfer和llama.cpp都在相同的硬件上运行,并充分利用了RTX4090上的VRAM。这个推理引擎速度有多快?在单个NVIDIARTX4090GPU上运行LLM,PowerInfer的平均token生成速率为13.20tokens/s,峰值为29.08tokens/s,仅比顶级服务器A100GPU低18%,可适用于各种LLM。PowerInfer与

要让大型语言模型(LLM)充分发挥其能力,有效的prompt设计方案是必不可少的,为此甚至出现了promptengineering(提示工程)这一新兴领域。在各种prompt设计方案中,思维链(CoT)凭借其强大的推理能力吸引了许多研究者和用户的眼球,基于其改进的CoT-SC以及更进一步的思维树(ToT)也收获了大量关注。近日,苏黎世联邦理工学院、Cledar和华沙理工大学的一个研究团队提出了更进一步的想法:思维图(GoT)。让思维从链到树到图,为LLM构建推理过程的能力不断得到提升,研究者也通

近期,复旦大学自然语言处理团队(FudanNLP)推出LLM-basedAgents综述论文,全文长达86页,共有600余篇参考文献!作者们从AIAgent的历史出发,全面梳理了基于大型语言模型的智能代理现状,包括:LLM-basedAgent的背景、构成、应用场景、以及备受关注的代理社会。同时,作者们探讨了Agent相关的前瞻开放问题,对于相关领域的未来发展趋势具有重要价值。论文链接:https://arxiv.org/pdf/2309.07864.pdfLLM-basedAgent论文列表:

将不同的基模型象征为不同品种的狗,其中相同的「狗形指纹」表明它们源自同一个基模型。大模型的预训练需要耗费大量的计算资源和数据,因此预训练模型的参数成为各大机构重点保护的核心竞争力和资产。然而,与传统软件知识产权保护不同,对预训练模型参数盗用的判断存在以下两个新问题:1)预训练模型的参数,尤其是千亿级别模型的参数,通常不会开源。预训练模型的输出和参数会受到后续处理步骤(如SFT、RLHF、continuepretraining等)的影响,这使得判断一个模型是否基于另一个现有模型微调得来变得困难。无

FATE2.0全面升级,推动隐私计算联邦学习规模化应用FATE开源平台宣布发布FATE2.0版本,作为全球领先的联邦学习工业级开源框架。此次更新实现了联邦异构系统之间的互联互通,持续增强了隐私计算平台的互联互通能力。这一进展进一步推动了联邦学习与隐私计算规模化应用的发展。FATE2.0以全面互通为设计理念,采用开源方式对应用层、调度、通信、异构计算(算法)四个层面进行改造,实现了系统与系统、系统与算法、算法与算法之间异构互通的能力。FATE2.0的设计兼容了北京金融科技产业联盟的《金融业隐私计算

大型语言模型(LLM)被广泛应用于需要多个链式生成调用、高级提示技术、控制流以及与外部环境交互的复杂任务。尽管如此,目前用于编程和执行这些应用程序的高效系统却存在明显的不足之处。研究人员最近提出了一种新的结构化生成语言(StructuredGenerationLanguage),称为SGLang,旨在改进与LLM的交互性。通过整合后端运行时系统和前端语言的设计,SGLang使得LLM的性能更高、更易控制。这项研究也获得了机器学习领域的知名学者、CMU助理教授陈天奇的转发。总的来说,SGLang的

IBM再度发力。随着AI系统的飞速发展,其能源需求也在不断增加。训练新系统需要大量的数据集和处理器时间,因此能耗极高。在某些情况下,执行一些训练好的系统,智能手机就能轻松胜任。但是,执行的次数太多,能耗也会增加。幸运的是,有很多方法可以降低后者的能耗。IBM和英特尔已经试验过模仿实际神经元行为设计的处理器。IBM还测试了在相变存储器中执行神经网络计算,以避免重复访问RAM。现在,IBM又推出了另一种方法。该公司的新型NorthPole处理器综合了上述方法的一些理念,并将其与一种非常精简的计算运行

去噪扩散模型(DDM)是目前广泛应用于图像生成的一种方法。最近,XinleiChen、ZhuangLiu、谢赛宁和何恺明四人团队对DDM进行了解构研究。通过逐步剥离其组件,他们发现DDM的生成能力逐渐下降,但表征学习能力仍然保持一定水平。这说明DDM中的某些组件对于表征学习的作用可能并不重要。针对当前计算机视觉等领域的生成模型,去噪被认为是一种核心方法。这类方法通常被称为去噪扩散模型(DDM),通过学习一个去噪自动编码器(DAE),能够通过扩散过程有效地消除多个层级的噪声。这些方法实现了出色的图


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Dreamweaver CS6
Visual web development tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),
