Home  >  Article  >  Technology peripherals  >  Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

WBOY
WBOYOriginal
2024-07-24 20:38:59313browse
Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?
The AIxiv column is a column where academic and technical content is published on this site. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

This work was initiated by the basic theory innovation team led by Professor Zhu Jun of Tsinghua University. For a long time, the team has focused on the current bottleneck issues in the development of artificial intelligence, explored original artificial intelligence theories and key technologies, and is at the international leading level in the research on adversarial security theories and methods of intelligent algorithms. It has also conducted in-depth research on the adversarial robustness and effectiveness of deep learning. Basic common issues such as data utilization efficiency. Relevant work won the first prize of Wu Wenjun Artificial Intelligence Natural Science Award, published more than 100 CCF Class A papers, developed the open source ARES counterattack attack and defense algorithm platform (https://github.com/thu-ml/ares), and realized some patented products Transform learning and research into practical application.

Multi-modal large language models (MLLMs) represented by GPT-4o have attracted much attention due to their excellent performance in multiple modalities such as language and images. They have not only become users' right-hand assistants in daily work, but have also gradually penetrated into major application fields such as autonomous driving and medical diagnosis, setting off a technological revolution.
However, are multi-modal large models safe and reliable?

Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

                                                                                                                                                                                                                         As shown in Figure 1, by modifying the image pixels through adversarial attacks, GPT-4o will The tailed lion statue was mistakenly identified as the Eiffel Tower in Paris or Big Ben in London. The content of such error targets can be customized at will, even beyond the safe boundaries of the model application.
                                                                                                                                                                               
In the jailbreak attack scenario, although Claude successfully rejected the malicious request in text form, when the user input an additional solid-color unrelated picture, the model output false news according to the user's request. This means that large multi-modal models have more risks and challenges than large language models.

In addition to these two examples, multi-modal large models also have various security threats or social risks such as illusion, bias, and privacy leakage, which will seriously affect their reliability and credibility in practical applications. Do these vulnerability issues occur by chance, or are they widespread? What are the differences in the credibility of different multimodal large models, and where do they come from?

Recently, researchers from Tsinghua University, Beihang University, Shanghai Jiao Tong University and Ruilai Intelligence jointly wrote a hundred-page article and released a comprehensive benchmark called MultiTrust, which for the first time comprehensively evaluated mainstream multi-modal modes from multiple dimensions and perspectives. The credibility of the large model demonstrates multiple potential security risks and inspires the next development of multi-modal large models.
Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?
  • Paper title: Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study
  • Paper link: https://arxiv.org/pdf/2406.07057
  • Project homepage: https:// multi-trust.github.io/
  • Code repository: https://github.com/thu-ml/MMTrustEval In its large-scale model evaluation work, MultiTrust refined five credibility evaluation dimensions—truthfulness, safety, robustness, fairness, and privacy. Secondary classification is carried out, and tasks, indicators, and data sets are constructed in a targeted manner to provide a comprehensive assessment.

                                                                                                                                                                                       Task scenarios cover discrimination and generation tasks, spanning pure text tasks and multimodal tasks. The data sets corresponding to the tasks are not only transformed and adapted based on public text or image data sets, but also some more complex and challenging data are constructed through manual collection or algorithm synthesis.

                                                                                                                                                                                                         Figure 5 MultiTrust task list
 

Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

Different from the trustworthy evaluation of large language models (LLMs), ML The multi-modal features of LM bring more diverse and complex Risk scenarios and possibilities. In order to better conduct systematic evaluation, the MultiTrust benchmark not only starts from the traditional behavioral evaluation dimension, but also innovatively introduces the two evaluation perspectives of multi-modal risk and cross-modal impact, comprehensively covering the new issues brought by the new modalities. new challenge.险 Figure 6 The risk of multi -mode risk and cross -modular impact

Specifically, multi -mode risk refers to the new risks brought by multi -mode scene, such as Possible incorrect answers when the model processes visual misleading information, as well as misjudgments in multi-modal reasoning involving safety issues. Although the model can correctly identify the alcohol in the picture, in further reasoning, some models are not aware of the potential risk of sharing it with cephalosporin drugs.
Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?
涉 Figure 7 Models in the reasoning involving security issues have misjudgment
Cross -modal effects refer to the impact of the addition of new modes on the credibility of the original mode, such as input of irrelevant images It may change the trusted behavior of the large language model backbone network in plain text scenarios, leading to more unpredictable security risks. In jailbreaking attacks and contextual privacy leakage tasks commonly used for large language model credibility assessment, if the model is provided with a picture that has nothing to do with the text, the original security behavior may be destroyed (Figure 2).
Result analysis and key conclusions
Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?
                                                                                                                                                                            ‐ to
----- a real-time update of the credibility list (part)

The researchers maintain a regularly updated multi-modal database The latest models such as GPT-4o and Claude3.5 have been added to the model credibility list. Overall, closed-source commercial models are safer and more reliable than mainstream open-source models. Among them, OpenAI's GPT-4 and Anthropic's Claude ranked highest in credibility, while Microsoft Phi-3, which added security alignment, ranked highest among open source models, but there is still a certain gap with the closed source model.

Commercial models such as GPT-4, Claude, and Gemini have implemented many reinforcement technologies for security and trustworthiness, but there are still some security and trustworthiness risks. For example, they still show vulnerability to adversarial attacks, multi-modal jailbreak attacks, etc., which greatly interferes with user experience and trust.
Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?
                                                                
Although the scores of many open source models on mainstream general lists are equivalent to or even better than GPT-4, In trust-level testing, these models still showed weaknesses and loopholes in different aspects. For example, the emphasis on general capabilities (such as OCR) during the training phase makes embedding jailbroken text and sensitive information into image input a more threatening source of risk.
Based on the experimental results of cross-modal effects, the author found that multi-modal training and inference will weaken the safe alignment mechanism of large language models. Many multi-modal large models will use aligned large language models as the backbone network and fine-tune during the multi-modal training process. The results show that these models still exhibit large security vulnerabilities and credible risks. At the same time, in multiple pure text trustworthiness assessment tasks, introducing images during reasoning will also have an impact and interference on the trustworthy behavior of the model.
后 Selepas imej diperkenalkan dalam Rajah 10, model lebih cenderung untuk membocorkan kandungan privasi dalam teks Eksperimen telah menunjukkan bahawa kredibiliti model berbilang mod dan besar adalah berkaitan dengan keupayaan amnya, tetapi masih terdapat perbezaan. dalam prestasi model dalam dimensi penilaian kredibiliti yang berbeza. Algoritma berkaitan model besar berbilang modal yang biasa pada masa ini, seperti set data penalaan halus yang dihasilkan dengan bantuan GPT-4V, RLHF untuk halusinasi, dll., tidak mencukupi untuk meningkatkan sepenuhnya kredibiliti model. Kesimpulan sedia ada juga menunjukkan bahawa model besar berbilang modal mempunyai cabaran unik yang berbeza daripada model bahasa besar, dan algoritma yang inovatif dan cekap diperlukan untuk penambahbaikan selanjutnya.
Lihat kertas untuk keputusan dan analisis terperinci.

Hala Tuju Masa Depan

Hasil penyelidikan menunjukkan bahawa meningkatkan kredibiliti model besar berbilang modal memerlukan perhatian khusus daripada penyelidik. Dengan menggunakan penyelesaian penjajaran model bahasa yang besar, data dan senario latihan yang pelbagai serta paradigma seperti Retrieval Enhanced Generation (RAG) dan Constitutional AI (Constitutional AI) boleh membantu meningkatkan ke tahap tertentu. Tetapi peningkatan kredibiliti model besar berbilang mod melampaui ini. Penjajaran antara modaliti dan keteguhan pengekod visual juga merupakan faktor utama yang mempengaruhi. Selain itu, meningkatkan prestasi model dalam aplikasi praktikal melalui penilaian berterusan dan pengoptimuman dalam persekitaran dinamik juga merupakan hala tuju penting pada masa hadapan.
Dengan keluaran penanda aras MultiTrust, pasukan penyelidik juga mengeluarkan alat penilaian kebolehpercayaan model besar multi-modal MMTrustEval Ciri-ciri integrasi model dan penilaiannya memberikan maklumat penting untuk penyelidikan kredibiliti model besar berbilang modal . Berdasarkan kerja dan kit alat ini, pasukan menganjurkan pertandingan data dan algoritma berkaitan keselamatan model besar berbilang modal [1,2] untuk mempromosikan penyelidikan yang boleh dipercayai pada model besar. Pada masa hadapan, dengan kemajuan teknologi yang berterusan, model besar berbilang modal akan menunjukkan potensi mereka dalam lebih banyak bidang, tetapi isu kredibiliti mereka masih memerlukan perhatian yang berterusan dan penyelidikan yang mendalam.

Pautan rujukan:

[1] CCDM2024 Multimodal Large Language Model Red Team Cabaran Keselamatan http://116.1114.8df
[2] Pertandingan Algoritma Pazhou Ke-3 - Teknologi Pengukuhan Keselamatan Algoritma Model Besar Berbilang Modal https://iacc.pazhoulab-huangpu.com/contestdetail?id=668de7357ff47da8cc88c7b8&award=1,00,

The above is the detailed content of Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn