Home >Technology peripherals >AI >CVPR\'24 | LightDiff: Diffusion model in low-light scenes, directly lighting up the night!
Original title: Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
Paper link: https://arxiv.org/pdf/2404.04804.pdf
Author affiliation: Cleveland State University, University of Texas at Austin, A*STAR, New York University, University of California, Los Angeles
LightDiff is a technology that improves the efficiency and scalability of vision-centered perception systems for autonomous driving. LiDAR systems have received considerable attention recently. However, these systems often experience difficulties in low-light conditions, potentially affecting their performance and safety. To solve this problem, this article introduces LightDiff, an automated framework designed to improve low-light image quality in autonomous driving applications. Specifically, this paper adopts a multi-condition controlled diffusion model. LightDiff eliminates the need for manually collected pairwise data and instead exploits a dynamic data degradation process. It incorporates a novel multi-condition adapter that is able to adaptively control input weights from different modalities, including depth maps, RGB images, and text captions, to simultaneously maintain content consistency under low-light and low-light conditions. Furthermore, to match the augmented images with the knowledge of the detection model, LightDiff uses perceptron-specific scores as rewards to guide the diffusion training process through reinforcement learning. Extensive experiments on the nuScenes dataset show that LightDiff can significantly improve the performance of multiple state-of-the-art 3D detectors in nighttime conditions while achieving high visual quality scores, highlighting its potential for ensuring autonomous driving safety.
This paper proposes the Lighting Diffusion (LightDiff) model to enhance low-light camera images in autonomous driving, reducing the need for large amounts of nighttime data collection demand and maintain performance capabilities during the day.
This paper integrates multiple input modes including depth maps and image captions, and proposes a multi-condition adapter to ensure semantic integrity in image conversion while maintaining high visual quality. This paper adopts a practical process to generate day and night image pairs from daytime data to achieve efficient model training.
This paper introduces a fine-tuning mechanism using reinforcement learning, combined with perceptually customized domain knowledge (credible lidar and consistency of statistical distributions) to ensure that the diffusion process has a strength conducive to human visual perception , and use the perceptual model to perform perceptual model. This method has significant advantages in human visual perception and also has the advantages of perceptual models.
Extensive experiments on the nuScenes dataset show that LightDiff significantly improves the performance of 3D vehicle detection at night and outperforms other generative models on multiple perspective metrics.
Figure 1. Driving scenarios at night are more deadly than during the day. The fatality rate is much higher at night [4]. This article aims to enhance nighttime images to improve the overall safety of nighttime driving.
As shown in Figure 1, night driving is challenging for humans, especially for autonomous vehicles. This challenge was highlighted by a catastrophic incident on March 18, 2018, when a self-driving car from Uber Advanced Technologies Group struck and killed a pedestrian in Arizona [37]. The incident, which was caused by the vehicle's failure to accurately detect a pedestrian in low-light conditions, has brought safety issues for autonomous vehicles to the forefront, especially in such demanding environments. As vision-centric autonomous driving systems increasingly rely on camera sensors, addressing safety concerns in low-light conditions has become increasingly critical to ensure the overall safety of these vehicles.
An intuitive solution is to collect large amounts of nighttime driving data. However, this method is not only labor-intensive and costly, but also may harm the performance of the daytime model due to the difference in image distribution between nighttime and daytime. To address these challenges, this paper proposes the Lighting Diffusion (LightDiff) model, a novel approach that eliminates the need for manual data collection and maintains daytime model performance.
The goal of LightDiff is to enhance low-light camera images and improve the performance of perceptual models. By using a dynamic low-light attenuation process, LightDiff generates synthetic day-night image pairs for training from existing daytime data. Next, this paper adopts Stable Diffusion [44] technology due to its ability to produce high-quality visual effects that effectively transform nighttime scenes into daytime equivalents. However, maintaining semantic consistency is crucial in autonomous driving, which was a challenge faced by the original Stable Diffusion model. To overcome this, LightDiff combines multiple input modalities, such as estimated depth maps and camera image captions, with a multi-condition adapter. This adapter intelligently determines the weight of each input modality, ensuring the semantic integrity of the converted image while maintaining high visual quality. In order to guide the diffusion process not only in the direction of being brighter for human vision, but also for perception models, this paper further uses reinforcement learning to fine-tune this paper's LightDiff, adding domain knowledge tailored for perception into the loop. This paper conducts extensive experiments on the autonomous driving dataset nuScenes [7] and demonstrates that our LightDiff can significantly improve the average accuracy (AP) of nighttime 3D vehicle detection for two state-of-the-art models, BEVDepth [32] and BEVStereo. [31] improved by 4.2% and 4.6%.
Figure 2. The architecture of the Lighting Diffusion model (LightDiff) in this article. During the training phase, a training data generation process enables the acquisition of trimodal data without any manual collection of paired data. Our LightDiff uses a multi-condition adapter to dynamically weight multiple conditions, combined with lidar and distributed reward modeling (LDRM), allowing for perception-oriented control.
Figure 3. The training data generation process of this article. Low-light degradation transformation [9] is only implemented during the training phase. The trained depth estimation network will be frozen and used for the training and testing phases of the Lighting Diffusion model in this article.
Figure 4. Schematic diagram of Recurrent Lighting Inference. It is designed to improve the accuracy of generating text hints and depth maps, thereby mitigating the detrimental effects of dark images.
Figure 5. Visual comparison on a sample of nighttime images in the nuScenes validation set.
Figure 6. Visualization of 3D detection results on an example of nighttime images in the nuScenes validation set. This paper uses BEVDepth [32] as a three-dimensional detector and visualizes the front view and Bird’s-Eye-View of the camera.
Figure 7. Shows the visual effect of LightDiff in this article with or without the MultiCondition Adapter. The input to ControlNet [55] remains consistent, including the same text cues and depth maps. Multi-condition adapters enable better color contrast and richer details during enhancement.
Figure 8. Examples of attention maps for different modal inputs.
Figure 9. Schematic diagram of enhanced multi-modal generation through Recurrent Lighting Inference (ReLI). By calling ReLI once, the accuracy of text hints and depth map predictions is improved.
This article introduces LightDiff, a tool designed for autonomous driving applications. Domain-specific framework designed to improve image quality in low-light environments and alleviate challenges faced by vision-centric perception systems. By leveraging a dynamic data degradation process, multi-condition adapters for different input modalities, and perceptually specific score-guided reward modeling using reinforcement learning, LightDiff significantly improves nighttime image quality and 3D performance on the nuScenes dataset Vehicle detection performance. This innovation not only eliminates the need for large amounts of nighttime data, but also ensures semantic integrity in image transformation, demonstrating its potential to improve safety and reliability in autonomous driving scenarios. In the absence of realistic paired day-night images, it is quite difficult to synthesize dim driving images with car lights, which limits research in this field. Future research could focus on better collection or generation of high-quality training data.
@ARTICLE{2024arXiv240404804L,
author = {{Li}, Jinlong and {Li}, Baolu and {Tu}, Zhengzhong and { Liu}, Xinyu and {Guo}, Qing and {Juefei-Xu}, Felix and {Xu}, Runsheng and {Yu}, Hongkai},
title = "{Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving}",
journal = {arXiv e-prints},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
year = 2024,
month = apr,
eid = {arXiv:2404.04804},
pages = {arXiv:2404.04804},
doi = {10.48550/arXiv.2404.04804},
archivePrefix = {arXiv},
eprint = {2404.04804},
primaryClass = {cs.CV},
adsurl = {https://ui.adsabs.harvard.edu/abs/2024arXiv240404804L},
adsnote = {Provided by the SAO /NASA Astrophysics Data System}
}
The above is the detailed content of CVPR\'24 | LightDiff: Diffusion model in low-light scenes, directly lighting up the night!. For more information, please follow other related articles on the PHP Chinese website!