Home  >  Article  >  Technology peripherals  >  More granular background and foreground control, faster editing: BEVControl's two-stage approach

More granular background and foreground control, faster editing: BEVControl's two-stage approach

WBOY
WBOYforward
2023-09-07 23:21:061435browse

This article will introduce a method to accurately generate multi-view street view images through BEV Sketch layout

More granular background and foreground control, faster editing: BEVControls two-stage approach

In the field of autonomous driving, image synthesis is widely used to improve downstream perception Task performance

In the field of computer vision, a long-standing research problem in improving the performance of perceptual models is to achieve it by synthesizing images. In vision-centric autonomous driving systems, using multi-view cameras, this problem becomes more prominent because some long-tail scenes can never be collected.

More granular background and foreground control, faster editing: BEVControls two-stage approach

According to As shown in Figure 1(a), the existing generation method inputs the semantic segmentation-style BEV structure into the generation network and outputs reasonable multi-view images. When evaluated solely on scene-level metrics, existing methods appear to be capable of synthesizing photorealistic street view images. Once zoomed in, however, we found that it failed to produce accurate object-level detail. In the figure, we demonstrate a common mistake of state-of-the-art generation algorithms, which is that the generated vehicle is completely oriented in the opposite direction compared to the target 3D bounding box. Furthermore, editing semantic segmentation-style BEV structures is a difficult task that requires a lot of manpower. Therefore, we propose a two-stage method called BEVControl for providing more refined background and foreground geometries. control, as shown in Figure 1(b). BEVControl supports sketch-style BEV structure input, allowing for quick and easy editing. Additionally, our BEVControl decomposes visual consistency into two sub-goals: geometric consistency between street views and bird's-eye views via the Controller; appearance consistency between street views via the Coordinator

##Paper link: More granular background and foreground control, faster editing: BEVControls two-stage approach

https://www.php.cn/link/1531beb762df4029513ebf9295e0d34f

Method Framework

More granular background and foreground control, faster editing: BEVControls two-stage approachBEVControl is a generated network of UNet structure, consisting of a series of modules. Each module has two elements, namely Controller and Coordinator.

    Input: BEV sketch, multi-view noise image and text prompt for easy editing;
  • Output: generated multi-view image.
  • Method details

More granular background and foreground control, faster editing: BEVControls two-stage approach Camera projection process of BEV sketch to camera condition. Input is a BEV sketch. The output is multi-view foreground conditions and background conditions.

More granular background and foreground control, faster editing: BEVControls two-stage approachController: Receives the foreground and background information of the camera view sketch in a self-attentional manner, and outputs geometric consistency with the BEV sketch Streetscape features.

    Coordinator: Utilizes a novel cross-view and cross-element attention mechanism to achieve cross-view contextual interaction and output street view features with appearance consistency.
  • Proposed evaluation metrics

Recent street view image generation work only evaluates generation based on scene-level metrics (such as FID, road mIoU, etc.) quality.

    We found that it is impossible to evaluate the true generative ability of the generative network using only these metrics, as shown in the figure below. The reported qualitative and quantitative results show that both groups generate Street View images with similar FID scores but very different capabilities for fine-grained control over the foreground and background.
  • Therefore, we propose a set of evaluation indicators for finely measuring the control capabilities of the generated network.

More granular background and foreground control, faster editing: BEVControls two-stage approachQuantitative results

Comparison of BEVControl and state-of-the-art methods on the proposed evaluation indicators.

  • Apply BEVControl for data enhancement to improve target detection tasks. More granular background and foreground control, faster editing: BEVControls two-stage approach

Qualitative results

  • Comparison of BEVControl and state-of-the-art methods on the NuScenes validation set.

More granular background and foreground control, faster editing: BEVControls two-stage approach

More granular background and foreground control, faster editing: BEVControls two-stage approach

##Demo effect

More granular background and foreground control, faster editing: BEVControls two-stage approach

More granular background and foreground control, faster editing: BEVControls two-stage approach

The content that needs to be rewritten is: Reference

The content that needs to be rewritten is: [1] Swerdlow A, Xu R, Zhou B. Generating street view images from a bird's-eye view layout[ J]. arXiv preprint arXiv:2301.04634, 2023.

The above is the detailed content of More granular background and foreground control, faster editing: BEVControl's two-stage approach. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jiqizhixin.com. If there is any infringement, please contact admin@php.cn delete