search
HomeTechnology peripheralsAICVPR\'24 | LightDiff: Diffusion model in low-light scenes, directly lighting up the night!

Original title: Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

Paper link: https://arxiv.org/pdf/2404.04804.pdf

Author affiliation: Cleveland State University, University of Texas at Austin, A*STAR, New York University, University of California, Los Angeles

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Thesis idea:

LightDiff is a technology that improves the efficiency and scalability of vision-centered perception systems for autonomous driving. LiDAR systems have received considerable attention recently. However, these systems often experience difficulties in low-light conditions, potentially affecting their performance and safety. To solve this problem, this article introduces LightDiff, an automated framework designed to improve low-light image quality in autonomous driving applications. Specifically, this paper adopts a multi-condition controlled diffusion model. LightDiff eliminates the need for manually collected pairwise data and instead exploits a dynamic data degradation process. It incorporates a novel multi-condition adapter that is able to adaptively control input weights from different modalities, including depth maps, RGB images, and text captions, to simultaneously maintain content consistency under low-light and low-light conditions. Furthermore, to match the augmented images with the knowledge of the detection model, LightDiff uses perceptron-specific scores as rewards to guide the diffusion training process through reinforcement learning. Extensive experiments on the nuScenes dataset show that LightDiff can significantly improve the performance of multiple state-of-the-art 3D detectors in nighttime conditions while achieving high visual quality scores, highlighting its potential for ensuring autonomous driving safety.

Main contributions:

This paper proposes the Lighting Diffusion (LightDiff) model to enhance low-light camera images in autonomous driving, reducing the need for large amounts of nighttime data collection demand and maintain performance capabilities during the day.

This paper integrates multiple input modes including depth maps and image captions, and proposes a multi-condition adapter to ensure semantic integrity in image conversion while maintaining high visual quality. This paper adopts a practical process to generate day and night image pairs from daytime data to achieve efficient model training.

This paper introduces a fine-tuning mechanism using reinforcement learning, combined with perceptually customized domain knowledge (credible lidar and consistency of statistical distributions) to ensure that the diffusion process has a strength conducive to human visual perception , and use the perceptual model to perform perceptual model. This method has significant advantages in human visual perception and also has the advantages of perceptual models.

Extensive experiments on the nuScenes dataset show that LightDiff significantly improves the performance of 3D vehicle detection at night and outperforms other generative models on multiple perspective metrics.

Network Design:

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Figure 1. Driving scenarios at night are more deadly than during the day. The fatality rate is much higher at night [4]. This article aims to enhance nighttime images to improve the overall safety of nighttime driving.

As shown in Figure 1, night driving is challenging for humans, especially for autonomous vehicles. This challenge was highlighted by a catastrophic incident on March 18, 2018, when a self-driving car from Uber Advanced Technologies Group struck and killed a pedestrian in Arizona [37]. The incident, which was caused by the vehicle's failure to accurately detect a pedestrian in low-light conditions, has brought safety issues for autonomous vehicles to the forefront, especially in such demanding environments. As vision-centric autonomous driving systems increasingly rely on camera sensors, addressing safety concerns in low-light conditions has become increasingly critical to ensure the overall safety of these vehicles.

An intuitive solution is to collect large amounts of nighttime driving data. However, this method is not only labor-intensive and costly, but also may harm the performance of the daytime model due to the difference in image distribution between nighttime and daytime. To address these challenges, this paper proposes the Lighting Diffusion (LightDiff) model, a novel approach that eliminates the need for manual data collection and maintains daytime model performance.

The goal of LightDiff is to enhance low-light camera images and improve the performance of perceptual models. By using a dynamic low-light attenuation process, LightDiff generates synthetic day-night image pairs for training from existing daytime data. Next, this paper adopts Stable Diffusion [44] technology due to its ability to produce high-quality visual effects that effectively transform nighttime scenes into daytime equivalents. However, maintaining semantic consistency is crucial in autonomous driving, which was a challenge faced by the original Stable Diffusion model. To overcome this, LightDiff combines multiple input modalities, such as estimated depth maps and camera image captions, with a multi-condition adapter. This adapter intelligently determines the weight of each input modality, ensuring the semantic integrity of the converted image while maintaining high visual quality. In order to guide the diffusion process not only in the direction of being brighter for human vision, but also for perception models, this paper further uses reinforcement learning to fine-tune this paper's LightDiff, adding domain knowledge tailored for perception into the loop. This paper conducts extensive experiments on the autonomous driving dataset nuScenes [7] and demonstrates that our LightDiff can significantly improve the average accuracy (AP) of nighttime 3D vehicle detection for two state-of-the-art models, BEVDepth [32] and BEVStereo. [31] improved by 4.2% and 4.6%.

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Figure 2. The architecture of the Lighting Diffusion model (LightDiff) in this article. During the training phase, a training data generation process enables the acquisition of trimodal data without any manual collection of paired data. Our LightDiff uses a multi-condition adapter to dynamically weight multiple conditions, combined with lidar and distributed reward modeling (LDRM), allowing for perception-oriented control.

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Figure 3. The training data generation process of this article. Low-light degradation transformation [9] is only implemented during the training phase. The trained depth estimation network will be frozen and used for the training and testing phases of the Lighting Diffusion model in this article.

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Figure 4. Schematic diagram of Recurrent Lighting Inference. It is designed to improve the accuracy of generating text hints and depth maps, thereby mitigating the detrimental effects of dark images.

Experimental results:

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Figure 5. Visual comparison on a sample of nighttime images in the nuScenes validation set.

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Figure 6. Visualization of 3D detection results on an example of nighttime images in the nuScenes validation set. This paper uses BEVDepth [32] as a three-dimensional detector and visualizes the front view and Bird’s-Eye-View of the camera.

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Figure 7. Shows the visual effect of LightDiff in this article with or without the MultiCondition Adapter. The input to ControlNet [55] remains consistent, including the same text cues and depth maps. Multi-condition adapters enable better color contrast and richer details during enhancement.

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Figure 8. Examples of attention maps for different modal inputs.

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Figure 9. Schematic diagram of enhanced multi-modal generation through Recurrent Lighting Inference (ReLI). By calling ReLI once, the accuracy of text hints and depth map predictions is improved.

CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!CVPR\'24 | LightDiff:低光照场景下的扩散模型,直接照亮夜晚!

Summary:

This article introduces LightDiff, a tool designed for autonomous driving applications. Domain-specific framework designed to improve image quality in low-light environments and alleviate challenges faced by vision-centric perception systems. By leveraging a dynamic data degradation process, multi-condition adapters for different input modalities, and perceptually specific score-guided reward modeling using reinforcement learning, LightDiff significantly improves nighttime image quality and 3D performance on the nuScenes dataset Vehicle detection performance. This innovation not only eliminates the need for large amounts of nighttime data, but also ensures semantic integrity in image transformation, demonstrating its potential to improve safety and reliability in autonomous driving scenarios. In the absence of realistic paired day-night images, it is quite difficult to synthesize dim driving images with car lights, which limits research in this field. Future research could focus on better collection or generation of high-quality training data.

Citation:

@ARTICLE{2024arXiv240404804L,
author = {{Li}, Jinlong and {Li}, Baolu and {Tu}, Zhengzhong and { Liu}, Xinyu and {Guo}, Qing and {Juefei-Xu}, Felix and {Xu}, Runsheng and {Yu}, Hongkai},
title = "{Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving}",
journal = {arXiv e-prints},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
year = 2024,
month = apr,
eid = {arXiv:2404.04804},
pages = {arXiv:2404.04804},
doi = {10.48550/arXiv.2404.04804},
archivePrefix = {arXiv},
eprint = {2404.04804},
primaryClass = {cs.CV},
adsurl = {https://ui.adsabs.harvard.edu/abs/2024arXiv240404804L},
adsnote = {Provided by the SAO /NASA Astrophysics Data System}
}

The above is the detailed content of CVPR\'24 | LightDiff: Diffusion model in low-light scenes, directly lighting up the night!. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Tesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserTesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserApr 22, 2025 am 11:48 AM

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Sam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailSam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailApr 22, 2025 am 11:29 AM

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's AI Omniverse Expands At GTC 2025Nvidia's AI Omniverse Expands At GTC 2025Apr 22, 2025 am 11:28 AM

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Exploring the Capabilities of Google's Gemma 2 ModelsExploring the Capabilities of Google's Gemma 2 ModelsApr 22, 2025 am 11:26 AM

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

The Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaThe Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaApr 22, 2025 am 11:21 AM

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

AI For Runners And Athletes: We're Making Excellent ProgressAI For Runners And Athletes: We're Making Excellent ProgressApr 22, 2025 am 11:12 AM

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Jamie Engstrom On Technology, Talent And Transformation At CaterpillarJamie Engstrom On Technology, Talent And Transformation At CaterpillarApr 22, 2025 am 11:10 AM

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

New Google Photos Update Makes Any Photo Pop With Ultra HDR QualityNew Google Photos Update Makes Any Photo Pop With Ultra HDR QualityApr 22, 2025 am 11:09 AM

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.