Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?-AI-php.cn

Home

Technology peripherals

Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 24, 2024 pm 08:38 PM

projectMultimodal large model

Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

The AIxiv column is a column where academic and technical content is published on this site. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

This work was initiated by the basic theory innovation team led by Professor Zhu Jun of Tsinghua University. For a long time, the team has focused on the current bottleneck issues in the development of artificial intelligence, explored original artificial intelligence theories and key technologies, and is at the international leading level in the research on adversarial security theories and methods of intelligent algorithms. It has also conducted in-depth research on the adversarial robustness and effectiveness of deep learning. Basic common issues such as data utilization efficiency. Relevant work won the first prize of Wu Wenjun Artificial Intelligence Natural Science Award, published more than 100 CCF Class A papers, developed the open source ARES counterattack attack and defense algorithm platform (https://github.com/thu-ml/ares), and realized some patented products Transform learning and research into practical application.

Multi-modal large language models (MLLMs) represented by GPT-4o have attracted much attention due to their excellent performance in multiple modalities such as language and images. They have not only become users' right-hand assistants in daily work, but have also gradually penetrated into major application fields such as autonomous driving and medical diagnosis, setting off a technological revolution.

However, are multi-modal large models safe and reliable?

^{As shown in Figure 1, by modifying the image pixels through adversarial attacks, GPT-4o will The tailed lion statue was mistakenly identified as the Eiffel Tower in Paris or Big Ben in London. The content of such error targets can be customized at will, even beyond the safe boundaries of the model application.}

In the jailbreak attack scenario, although Claude successfully rejected the malicious request in text form, when the user input an additional solid-color unrelated picture, the model output false news according to the user's request. This means that large multi-modal models have more risks and challenges than large language models.

In addition to these two examples, multi-modal large models also have various security threats or social risks such as illusion, bias, and privacy leakage, which will seriously affect their reliability and credibility in practical applications. Do these vulnerability issues occur by chance, or are they widespread? What are the differences in the credibility of different multimodal large models, and where do they come from?

Recently, researchers from Tsinghua University, Beihang University, Shanghai Jiao Tong University and Ruilai Intelligence jointly wrote a hundred-page article and released a comprehensive benchmark called MultiTrust, which for the first time comprehensively evaluated mainstream multi-modal modes from multiple dimensions and perspectives. The credibility of the large model demonstrates multiple potential security risks and inspires the next development of multi-modal large models.

Paper title: Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study
Paper link: https://arxiv.org/pdf/2406.07057
Project homepage: https:// multi-trust.github.io/
Code repository: https://github.com/thu-ml/MMTrustEval In its large-scale model evaluation work, MultiTrust refined five credibility evaluation dimensions—truthfulness, safety, robustness, fairness, and privacy. Secondary classification is carried out, and tasks, indicators, and data sets are constructed in a targeted manner to provide a comprehensive assessment.

Task scenarios cover discrimination and generation tasks, spanning pure text tasks and multimodal tasks. The data sets corresponding to the tasks are not only transformed and adapted based on public text or image data sets, but also some more complex and challenging data are constructed through manual collection or algorithm synthesis.

Figure 5 MultiTrust task list

Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

^{Different from the trustworthy evaluation of large language models (LLMs), ML The multi-modal features of LM bring more diverse and complex Risk scenarios and possibilities. In order to better conduct systematic evaluation, the MultiTrust benchmark not only starts from the traditional behavioral evaluation dimension, but also innovatively introduces the two evaluation perspectives of multi-modal risk and cross-modal impact, comprehensively covering the new issues brought by the new modalities. new challenge.险 Figure 6 The risk of multi -mode risk and cross -modular impact}

Specifically, multi -mode risk refers to the new risks brought by multi -mode scene, such as Possible incorrect answers when the model processes visual misleading information, as well as misjudgments in multi-modal reasoning involving safety issues. Although the model can correctly identify the alcohol in the picture, in further reasoning, some models are not aware of the potential risk of sharing it with cephalosporin drugs.

涉 Figure 7 Models in the reasoning involving security issues have misjudgment

Cross -modal effects refer to the impact of the addition of new modes on the credibility of the original mode, such as input of irrelevant images It may change the trusted behavior of the large language model backbone network in plain text scenarios, leading to more unpredictable security risks. In jailbreaking attacks and contextual privacy leakage tasks commonly used for large language model credibility assessment, if the model is provided with a picture that has nothing to do with the text, the original security behavior may be destroyed (Figure 2).

Result analysis and key conclusions

‐ to

^{----- a real-time update of the credibility list (part)}

The researchers maintain a regularly updated multi-modal database The latest models such as GPT-4o and Claude3.5 have been added to the model credibility list. Overall, closed-source commercial models are safer and more reliable than mainstream open-source models. Among them, OpenAI's GPT-4 and Anthropic's Claude ranked highest in credibility, while Microsoft Phi-3, which added security alignment, ranked highest among open source models, but there is still a certain gap with the closed source model.

Commercial models such as GPT-4, Claude, and Gemini have implemented many reinforcement technologies for security and trustworthiness, but there are still some security and trustworthiness risks. For example, they still show vulnerability to adversarial attacks, multi-modal jailbreak attacks, etc., which greatly interferes with user experience and trust.

Although the scores of many open source models on mainstream general lists are equivalent to or even better than GPT-4, In trust-level testing, these models still showed weaknesses and loopholes in different aspects. For example, the emphasis on general capabilities (such as OCR) during the training phase makes embedding jailbroken text and sensitive information into image input a more threatening source of risk.

Based on the experimental results of cross-modal effects, the author found that multi-modal training and inference will weaken the safe alignment mechanism of large language models. Many multi-modal large models will use aligned large language models as the backbone network and fine-tune during the multi-modal training process. The results show that these models still exhibit large security vulnerabilities and credible risks. At the same time, in multiple pure text trustworthiness assessment tasks, introducing images during reasoning will also have an impact and interference on the trustworthy behavior of the model.

后 Selepas imej diperkenalkan dalam Rajah 10, model lebih cenderung untuk membocorkan kandungan privasi dalam teks Eksperimen telah menunjukkan bahawa kredibiliti model berbilang mod dan besar adalah berkaitan dengan keupayaan amnya, tetapi masih terdapat perbezaan. dalam prestasi model dalam dimensi penilaian kredibiliti yang berbeza. Algoritma berkaitan model besar berbilang modal yang biasa pada masa ini, seperti set data penalaan halus yang dihasilkan dengan bantuan GPT-4V, RLHF untuk halusinasi, dll., tidak mencukupi untuk meningkatkan sepenuhnya kredibiliti model. Kesimpulan sedia ada juga menunjukkan bahawa model besar berbilang modal mempunyai cabaran unik yang berbeza daripada model bahasa besar, dan algoritma yang inovatif dan cekap diperlukan untuk penambahbaikan selanjutnya.

Lihat kertas untuk keputusan dan analisis terperinci.

Hala Tuju Masa Depan

Hasil penyelidikan menunjukkan bahawa meningkatkan kredibiliti model besar berbilang modal memerlukan perhatian khusus daripada penyelidik. Dengan menggunakan penyelesaian penjajaran model bahasa yang besar, data dan senario latihan yang pelbagai serta paradigma seperti Retrieval Enhanced Generation (RAG) dan Constitutional AI (Constitutional AI) boleh membantu meningkatkan ke tahap tertentu. Tetapi peningkatan kredibiliti model besar berbilang mod melampaui ini. Penjajaran antara modaliti dan keteguhan pengekod visual juga merupakan faktor utama yang mempengaruhi. Selain itu, meningkatkan prestasi model dalam aplikasi praktikal melalui penilaian berterusan dan pengoptimuman dalam persekitaran dinamik juga merupakan hala tuju penting pada masa hadapan.

Dengan keluaran penanda aras MultiTrust, pasukan penyelidik juga mengeluarkan alat penilaian kebolehpercayaan model besar multi-modal MMTrustEval Ciri-ciri integrasi model dan penilaiannya memberikan maklumat penting untuk penyelidikan kredibiliti model besar berbilang modal . Berdasarkan kerja dan kit alat ini, pasukan menganjurkan pertandingan data dan algoritma berkaitan keselamatan model besar berbilang modal [1,2] untuk mempromosikan penyelidikan yang boleh dipercayai pada model besar. Pada masa hadapan, dengan kemajuan teknologi yang berterusan, model besar berbilang modal akan menunjukkan potensi mereka dalam lebih banyak bidang, tetapi isu kredibiliti mereka masih memerlukan perhatian yang berterusan dan penyelidikan yang mendalam.

Pautan rujukan:

^{[1] CCDM2024 Multimodal Large Language Model Red Team Cabaran Keselamatan http://116.1114.8df}

^{[2] Pertandingan Algoritma Pazhou Ke-3 - Teknologi Pengukuhan Keselamatan Algoritma Model Besar Berbilang Modal https://iacc.pazhoulab-huangpu.com/contestdetail?id=668de7357ff47da8cc88c7b8&award=1,00},

The above is the detailed content of Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Meta's New AI Assistant: Productivity Booster Or Time Sink?May 01, 2025 am 11:18 AM

Meta has joined hands with partners such as Nvidia, IBM and Dell to expand the enterprise-level deployment integration of Llama Stack. In terms of security, Meta has launched new tools such as Llama Guard 4, LlamaFirewall and CyberSecEval 4, and launched the Llama Defenders program to enhance AI security. In addition, Meta has distributed $1.5 million in Llama Impact Grants to 10 global institutions, including startups working to improve public services, health care and education. The new Meta AI application powered by Llama 4, conceived as Meta AI

80% Of Gen Zers Would Marry An AI: StudyMay 01, 2025 am 11:17 AM

Joi AI, a company pioneering human-AI interaction, has introduced the term "AI-lationships" to describe these evolving relationships. Jaime Bronstein, a relationship therapist at Joi AI, clarifies that these aren't meant to replace human c

AI Is Making The Internet's Bot Problem Worse. This $2 Billion Startup Is On The Front LinesMay 01, 2025 am 11:16 AM

Online fraud and bot attacks pose a significant challenge for businesses. Retailers fight bots hoarding products, banks battle account takeovers, and social media platforms struggle with impersonators. The rise of AI exacerbates this problem, rende

Selling To Robots: The Marketing Revolution That Will Make Or Break Your BusinessMay 01, 2025 am 11:15 AM

AI agents are poised to revolutionize marketing, potentially surpassing the impact of previous technological shifts. These agents, representing a significant advancement in generative AI, not only process information like ChatGPT but also take actio

How Computer Vision Technology Is Transforming NBA Playoff OfficiatingMay 01, 2025 am 11:14 AM

AI's Impact on Crucial NBA Game 4 Decisions Two pivotal Game 4 NBA matchups showcased the game-changing role of AI in officiating. In the first, Denver's Nikola Jokic's missed three-pointer led to a last-second alley-oop by Aaron Gordon. Sony's Haw

How AI Is Accelerating The Future Of Regenerative MedicineMay 01, 2025 am 11:13 AM

Traditionally, expanding regenerative medicine expertise globally demanded extensive travel, hands-on training, and years of mentorship. Now, AI is transforming this landscape, overcoming geographical limitations and accelerating progress through en

Key Takeaways From Intel Foundry Direct Connect 2025May 01, 2025 am 11:12 AM

Intel is working to return its manufacturing process to the leading position, while trying to attract fab semiconductor customers to make chips at its fabs. To this end, Intel must build more trust in the industry, not only to prove the competitiveness of its processes, but also to demonstrate that partners can manufacture chips in a familiar and mature workflow, consistent and highly reliable manner. Everything I hear today makes me believe Intel is moving towards this goal. The keynote speech of the new CEO Tan Libo kicked off the day. Tan Libai is straightforward and concise. He outlines several challenges in Intel’s foundry services and the measures companies have taken to address these challenges and plan a successful route for Intel’s foundry services in the future. Tan Libai talked about the process of Intel's OEM service being implemented to make customers more

AI Gone Wrong? Now There's Insurance For ThatMay 01, 2025 am 11:11 AM

Addressing the growing concerns surrounding AI risks, Chaucer Group, a global specialty reinsurance firm, and Armilla AI have joined forces to introduce a novel third-party liability (TPL) insurance product. This policy safeguards businesses against

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

4 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

InZoi: How To Apply To School And University

4 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Where to find the Site Office Key in Atomfall

4 weeks agoByDDD

Hot Tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.