


New breakthrough in HCP laboratory of Sun Yat-sen University: using causal paradigm to upgrade multi-modal large models
Sun Yat-sen University’s Human-Computer Intelligence Fusion Laboratory (HCP) has made fruitful achievements in AIGC and multi-modal large models. It has been selected for more than ten articles in the recent AAAI 2023 and CVPR 2023, ranking among global research institutions. the first echelon.
One of the works realizes the use of causal models to significantly improve the controllability and generalization of multi-modal large models in tuning - "Masked Images Are Counterfactual Samples for" Robust Fine-tuning".
##Link: https://arxiv.org/abs/2303.03052
Using pre-trained large-scale models to fine-tune on downstream tasks is a currently popular deep learning paradigm. In particular, the recent outstanding performance of ChatGPT, a large pre-trained language model, has made this technical paradigm widely recognized. After pre-training with massive data, these large pre-trained models can adapt to the changing data distribution in the real environment, and therefore show strong robustness in general scenarios.
However, when the pre-trained large model is fine-tuned with downstream scenario data to adapt to specific application tasks, in the vast majority of cases these data are singular. Using these data to fine-tune the pre-trained large model will often reduce the robustness of the model, making it difficult to apply based on the pre-trained large model. Especially in terms of visual models, since the diversity of images far exceeds language, the problem of downstream fine-tuning training leading to a decrease in the robustness of vision-related pre-trained large models is particularly prominent.
Previous research methods usually maintain the robustness of the fine-tuned pre-trained model implicitly at the model parameter level through model integration and other methods. However, these works did not analyze the essential reasons why fine-tuning leads to out-of-distribution performance degradation of the model, nor did they clearly solve the above-mentioned problem of reduced robustness after fine-tuning of large models.
This work is based on the cross-modal large model, and analyzes the essential reasons for the robustness loss of the pre-trained large model from the perspective of causality, and accordingly proposes a method that can A fine-tuning training method that significantly improves model robustness. This method enables the model to maintain strong robustness while adapting to downstream tasks, and better meets the needs of practical applications.
Take the cross-modal pre-training large model CLIP (Contrastive Language–Image Pre-training) released by OpenAI in 2021 as an example: CLIP is a contrast-based image-text union The learned cross-modal pre-trained large model is the basis for generative models such as Stable Diffusion. The model is trained on massive multi-source data containing about 400 million image-text pairs, and learns some causal relationships that are robust to distribution changes to a certain extent.
However, when fine-tuning CLIP with single-feature downstream data, it is easy to destroy the causal knowledge learned by the model, because the non-semantic representation and semantic representation of the training image are highly entangled. of. For example, when applying CLIP model transfer to the downstream scenario of “farm,” many of the training images show “cows” in the grass. At this point, fine-tuning training may allow the model to learn to rely on the non-"cow" semantic representation of grass to predict the semantics of the image. However, this correlation is not necessarily true, for example "cows" may also appear on the road. Therefore, after the model is fine-tuned and trained, its robustness will be reduced, and the output results during application may become extremely unstable and lack controllability.
Based on the team’s years of experience in building and training large models, this work re-examines the problem of reduced robustness caused by fine-tuning of pre-trained models from the perspective of causality. Based on causal modeling and analysis, this work proposes a fine-tuning training method that constructs counterfactual samples based on image masks and improves model robustness through mask image learning.
Specifically, to break spurious correlations in downstream training images, this work proposes a class activation map (CAM)-based method to mask and replace the content of specific regions of the image , used to manipulate non-semantic representations or semantic representations of images to generate counterfactual samples. The fine-tuned model can learn to imitate the representation of these counterfactual samples by the pre-trained model through distillation, thereby better decoupling the influence of semantic factors and non-semantic factors, and improving the adaptability to distribution shifts in downstream fields.
Experiments show that this method can significantly improve the performance of the pre-trained model in downstream tasks, and at the same time improves the robustness compared to Existing fine-tuning training methods for large models have significant advantages.
The important significance of this work is to open up the "black box" inherited by the pre-trained large model from the deep learning paradigm to a certain extent, and to solve the "interpretability" of the large model. and "controllability" issues, bringing us closer to the tangible productivity improvements led by pre-trained large models.
The HCP team of Sun Yat-sen University has been engaged in research on large model technology paradigms for many years since the advent of the Transformer mechanism. It is committed to improving the training efficiency of large models and introducing causal models to solve the "controllable problem" of large models. "sex" issue. Over the years, the team has independently researched and developed multiple large pre-training models for vision, language, speech and cross-modality. The "Wukong" cross-modal large model jointly developed with Huawei's Noah's Ark Laboratory (link: https://arxiv .org/abs/2202.06767) is a typical case.
Team Introduction
Sun Yat-sen University Human-Computer-Object Intelligence Fusion Laboratory (HCP Lab) is engaged in multi-modal recognition It conducts systematic research in the fields of intelligent computing, robotics and embedded systems, metaverse and digital humans, and controllable content generation, and conducts in-depth application scenarios to create product prototypes, output a large number of original technologies, and incubate entrepreneurial teams. The laboratory was founded in 2010 by Professor Lin Liang, IAPR Fellow. It has won the first prize of Science and Technology Award of China Image and Graphics Society, the Wu Wenjun Natural Science Award, the first prize of provincial natural science and other honors; it has trained national-level young talents such as Liang Xiaodan and Wang Keze.
The above is the detailed content of New breakthrough in HCP laboratory of Sun Yat-sen University: using causal paradigm to upgrade multi-modal large models. For more information, please follow other related articles on the PHP Chinese website!

Running large language models at home with ease: LM Studio User Guide In recent years, advances in software and hardware have made it possible to run large language models (LLMs) on personal computers. LM Studio is an excellent tool to make this process easy and convenient. This article will dive into how to run LLM locally using LM Studio, covering key steps, potential challenges, and the benefits of having LLM locally. Whether you are a tech enthusiast or are curious about the latest AI technologies, this guide will provide valuable insights and practical tips. Let's get started! Overview Understand the basic requirements for running LLM locally. Set up LM Studi on your computer

Guy Peri is McCormick’s Chief Information and Digital Officer. Though only seven months into his role, Peri is rapidly advancing a comprehensive transformation of the company’s digital capabilities. His career-long focus on data and analytics informs

Introduction Artificial intelligence (AI) is evolving to understand not just words, but also emotions, responding with a human touch. This sophisticated interaction is crucial in the rapidly advancing field of AI and natural language processing. Th

Introduction In today's data-centric world, leveraging advanced AI technologies is crucial for businesses seeking a competitive edge and enhanced efficiency. A range of powerful tools empowers data scientists, analysts, and developers to build, depl

This week's AI landscape exploded with groundbreaking releases from industry giants like OpenAI, Mistral AI, NVIDIA, DeepSeek, and Hugging Face. These new models promise increased power, affordability, and accessibility, fueled by advancements in tr

But the company’s Android app, which offers not only search capabilities but also acts as an AI assistant, is riddled with a host of security issues that could expose its users to data theft, account takeovers and impersonation attacks from malicious

You can look at what’s happening in conferences and at trade shows. You can ask engineers what they’re doing, or consult with a CEO. Everywhere you look, things are changing at breakneck speed. Engineers, and Non-Engineers What’s the difference be

Simulate Rocket Launches with RocketPy: A Comprehensive Guide This article guides you through simulating high-power rocket launches using RocketPy, a powerful Python library. We'll cover everything from defining rocket components to analyzing simula


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

SublimeText3 Linux new version
SublimeText3 Linux latest version

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SublimeText3 English version
Recommended: Win version, supports code prompts!

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.