In the 2024 TED speech not long ago, Li Feifei explained the concept of Spatial Intelligence in detail. She is delighted and extremely enthusiastic about the rapid development of the field of computer vision in the past few years, and is creating a start-up company for this purpose
In this speech, it was mentioned A research result of the Stanford team is BEHAVIOR, which is a behavioral and action data set they "created" to train computers and robots how to act in a three-dimensional world. BEHAVIOR is a huge data set that contains human behaviors and actions in various scenarios. The purpose of this data set is to allow computers and robots to better understand and imitate human behavior. By analyzing a large amount of data in BEHAVIOR, researchers can obtain
Now, Wu Jiajun led the team to publish a follow-up study-"BEHAVIOR Vision Suite (BVS)". The paper also received CVPR 2024 Highlight.
# In the field of computer vision, quantitative data and comprehensive, customized labels are required to systematically evaluate and understand the performance of models under different conditions. However, real-world visual datasets often struggle to meet these needs. Although promising alternatives such as AI tasks offer promising alternatives, there are still many shortcomings in terms of resource and rendering quality, data diversity, and realism of physical properties.
In order to solve these problems, the research team launched "BEHAVIOR Vision Suite (BVS)".
BVS is a set of tools and resources designed for systematic evaluation of computer vision models. Based on the newly developed AI benchmark BEHAVIOR-1K, BVS can adjust parameters, covering scene-level (such as lighting, object placement) and object-level (such as joint configuration, attributes) and camera-level (such as field of view, field of view, focal length). Researchers can adjust these parameters during the data collection process to further precisely control the experiment.
This model also demonstrates the advantages of BVS in different model evaluation and training applications. Including Parametrically controllable evaluation of the robustness of vision models to continuous changes in environmental parameters, systematic evaluation of scene understanding models (rich visual annotations) and model training for new vision tasks
.
BVS includes two major parts : Data section and customizable data generator based on it
##Data part##Data part of BVS. Based on the assets of BEHAVIOR-1K, it includes a total of 8841 3D object models and indoor scenes designed by 51 artists, expanded to 1000 scene instances. These models and scenes have a realistic appearance and cover rich semantics. Category. The research team also provides a script that allows users to automatically generate more enhanced scene instances
BEHAVIOR-1K’s asset expansion#.
##Customizable data generator
The customizable data generator allows users to conveniently use BVS The data part is used to generate image data sets that meet their needs, such as indoor scenes under dark light. BVS can make the generated data set have high semantic diversity while meeting the needs, while ensuring its fidelity and physical rationality. Specifically, users can control the following five aspects: camera position, lighting, object properties (such as size), object status (such as on, off), and spatial relationships between objects. The researchers demonstrated the role of data generated by BVS in three application scenarios, including: Parameters controllably evaluate the robustness of the vision model when environmental parameters continuously change By generating data that continuously changes in a certain dimension, the researchers systematically evaluate the robustness of the visual model under this change. For example, data with gradually increasing degrees of object occlusion in the same scene are generated to evaluate the performance of the visual model under partially occluded objects. By evaluating different SOTA models, researchers found that the performance of existing models on data outside common distributions is still insufficient. Since these data are difficult to obtain or label in the real world, these conclusions are difficult to draw directly from real image data sets. Therefore, BVS can help researchers evaluate the robustness of the model under the conditions of interest to them to better develop and improve the model. The existing SOTA model still has room for improvement in robustness under changing conditions (such as camera elevation) Performance of different detection models when five environmental parameters change continuously Evaluating scene understanding models Another major feature of the dataset generated by BVS is that it contains multi-modal real labels, such as Depth, semantic segmentation, target bounding box, etc. This allows researchers to use data generated by BVS to evaluate prediction models for different tasks on the same image. The research team evaluated the SOTA model for four tasks: open word detection and segmentation, depth estimation, and point cloud reconstruction, and found that the model's performance on the BVS data set was in the same order as in the corresponding tasks. The performance on real data benchmarks is consistent. This shows that the high-quality data generated by BVS truly reflects and represents real-world data, and researchers hope that such data sets can promote the development of multi-task prediction models. In the open source code, the research team also provides a script to facilitate users to sample trajectories in the scene. The researchers collected many scene browsing videos to evaluate the scene understanding model ##Overall scene understanding data set. The researchers generated a large number of traversal videos in representative scenes, each containing more than 10 camera trajectories. For each image, BVS generates various labels (e.g., scene map, segmentation mask, depth map) SOTA模型在BVS資料上的相對錶現順序與真實任務基準相符 ##訓練新視覺任務模型 BVS 的資料⽣成不僅限於模型評估,對於難以在現實場景中收集或標註資料的任務, BVS 資料也可⽤於模型訓練。 作者利⽤ BVS ⽣成了 12.5k 張圖⽚,僅⽤其訓練了⼀個物體空間關係和狀態預測模型。該模型在未使⽤真實資料訓練的情況下,仍在真實場景下達到了 0.839 的 F1 得分,體現了優秀的模擬到現實的轉移能⼒(sim-to-real transfer)。 模擬⽣成訓練資料集與真實測試資料集例圖 使用BVS產生的資料訓練的物體空間關係和狀態預測模型 BVS 提供了⼀套強⼤的⼯具和資源集,為電腦視覺研究者⽣成客製化的合成資料集提供了新的⽅法。 透過系統地控制和調整資料⽣成過程中的各項參數,研究⼈員可以更全⾯地評估和改進電腦視覺模型的性能,為未來的研究和應⽤奠定堅實基礎。 Application scenarios
#總結
以上是李飛飛「空間智能」系列新進展,吳佳俊團隊新「BVS」套件評估電腦視覺模型的詳細內容。更多資訊請關注PHP中文網其他相關文章!