


Model preference only related to size? Shanghai Jiao Tong University comprehensively analyzes the quantitative components of human preferences and 32 large-scale models
In the current model training paradigm, the acquisition and use of preference data has become an indispensable part. In training, preference data is usually used as the training optimization target during alignment, such as reinforcement learning based on human or AI feedback (RLHF/RLAIF) or direct preference optimization (DPO), while in model evaluation, due to the task Since there is usually no standard answer due to the complexity of the problem, the preference annotations of human annotators or high-performance large models (LLM-as-a-Judge) are usually directly used as the judging criteria.
Although the above-mentioned applications of preference data have achieved widespread results, there is a lack of sufficient research on preferences themselves, which has largely hindered the development of more trustworthy AI systems. Construct. To this end, the Generative Artificial Intelligence Laboratory (GAIR) of Shanghai Jiao Tong University released a new research result, which systematically and comprehensively analyzed the preferences displayed by human users and up to 32 popular large language models to Learn how preference data from different sources is quantitatively composed of various predefined attributes such as harmlessness, humor, acknowledgment of limitations, etc.
The analysis conducted has the following characteristics:
- Focus on real applications: the data used in the study come from real users- Model dialogue can better reflect the preferences in actual applications.
- Scenario modeling: independently conduct modeling and analysis on data belonging to different scenarios (such as daily communication, creative writing), avoiding the mutual influence between different scenarios, and making the conclusions more accurate Clear and reliable.
- Unified framework: A unified framework is adopted to analyze the preferences of humans and large models, and has good scalability.
The study found that:
- Human users are less sensitive to errors in model responses and Acknowledging their own limitations leads to a clear distaste for refusing to answer, and a preference for responses that support their subjective position. Advanced large models like GPT-4-Turbo prefer responses that are error-free, clearly expressed, and safe.
- Large models with similar sizes will show similar preferences, and large models will almost not change their preference composition before and after alignment fine-tuning, but only change the intensity of their expressed preferences.
- Preference-based assessments can be intentionally manipulated. Encouraging the model under test to respond with attributes that the evaluator likes can improve the score, while injecting the least popular attributes can decrease the score.
In the "daily communication" scenario, according to the preference parsing results, Figure 1 shows humans, GPT-4-Turbo and LLaMA -2-70B-Chat's preference for different attributes. A larger value indicates a greater preference for the attribute, while a value less than 50 indicates no interest in the attribute.
This project has open sourced a wealth of content and resources:
- Interactive demonstration: includes all analysis visualizations and More detailed results are not shown in detail in the paper, and it also supports uploading new model preferences for quantitative analysis.
- Dataset: Contains user-model pairwise dialogue data collected in this study, including preference labels from real users and up to 32 large models, as well as for defined Detailed annotation of attributes.
- Code: Provides the automatic annotation framework used to collect data and instructions for its use. It also includes code for visualizing the analysis results.
- ##Paper: https://arxiv.org/abs/2402.11296
- Demo: https://huggingface.co/spaces/GAIR/Preference-Dissection-Visualization
- Code: https://github.com/GAIR-NLP/Preference-Dissection
- Dataset: https://huggingface.co/datasets/GAIR/preference- dissection
Method introduction
The study used paired user-model conversation data in the ChatbotArena Conversations data set, These data come from real application scenarios. Each sample contains a user question and two different model responses. The researchers first collected human users' preference labels for these samples, which were already included in the original dataset. In addition, the researchers additionally reasoned and collected labels from 32 different open or closed large models.
The study first built an automatic labeling framework based on GPT-4-Turbo, and labeled all model responses with their scores on 29 predefined attributes, and then compared the scores based on a pair. As a result, the "comparative characteristics" of each attribute of the sample point can be obtained. For example, if the harmlessness score of reply A is higher than that of reply B, the comparative characteristic of this attribute is 1, otherwise it is - 1, and if they are the same, it is 0.
Using the constructed comparison features and the collected binary preference labels, researchers can model comparison features to preferences by fitting a Bayesian linear regression model The mapping relationship between labels, and the model weight corresponding to each attribute in the fitted model can be regarded as the contribution of the attribute to the overall preference.
Since this study collected preference labels from a variety of different sources and conducted scenario-based modeling, in each scenario, for each source (human or specific large animal) model), can obtain a quantitative decomposition result of a set of preferences into attributes.
Figure 2: Schematic diagram of the overall process of the analysis framework
Analysis results
This study first analyzes and compares the three most preferred and least preferred attributes of human users and high-performance large models represented by GPT-4-Turbo in different scenarios. It can be seen that humans are significantly less sensitive to errors than GPT-4-Turbo, and hate to admit limitations and refuse to answer. In addition, humans also show a clear preference for responses that cater to their own subjective positions, regardless of whether the responses correct potential errors in the inquiry. In contrast, GPT-4-Turbo pays more attention to the correctness, harmlessness and clarity of expression of the response, and is committed to clarifying ambiguities in the inquiry.
Figure 3: Humans and GPT-4-Turbo’s three most preferred and least preferred items under different scenarios or queries. Attribute
Figure 4: Human and GPT-4-Turbo sensitivity to minor/moderate/severe errors, values close to 50 represent no sensitive.
In addition, the study also explored the degree of similarity of preference components between different large models. By dividing the large model into different groups and calculating the intra-group similarity and inter-group similarity respectively, it can be found that when divided according to the number of parameters (30B), the intra-group similarity (0.83, 0.88) is obviously Higher than the similarity between groups (0.74), but there is no similar phenomenon when divided by other factors, indicating that the preference for large models is largely determined by its size and has nothing to do with the training method.
Figure 5: Similarity of preferences between different large models (including humans), arranged by number of parameters.
On the other hand, the study also found that the large model after alignment fine-tuning showed almost the same preferences as the pre-trained version only, and the change only occurred in the strength of the expressed preference. Above, that is, the probability difference between the aligned model outputs of the two responses corresponding to candidate words A and B will increase significantly.
Figure 6: Preference changes of the large model before and after alignment fine-tuning
Finally, this study found that by quantitatively decomposing human or large model preferences into different attributes, the results of preference-based assessments can be intentionally manipulated. On the currently popular AlpacaEval 2.0 and MT-Bench data sets, injecting attributes preferred by the evaluator (human or large model) through non-training (setting system information) and training (DPO) methods can significantly improve scores, while injecting Attributes that are not preferred will reduce the score.
Figure 7: Results of intentional manipulation of two preference evaluation-based data sets, MT-Bench and AlpacaEval 2.0
Summary
This study provides a detailed analysis of the quantitative decomposition of human and large model preferences. The research team found that humans tend to respond directly to questions and are less sensitive to errors; while high-performance large models pay more attention to correctness, clarity, and harmlessness. Research also shows that model size is a key factor affecting preferred components, while fine-tuning it has little effect. Furthermore, this study demonstrates the vulnerability of several current datasets to manipulation when knowing the evaluator's preference components, illustrating the shortcomings of preference-based assessment. The research team has also made all research resources publicly available to support further research in the future.
The above is the detailed content of Model preference only related to size? Shanghai Jiao Tong University comprehensively analyzes the quantitative components of human preferences and 32 large-scale models. For more information, please follow other related articles on the PHP Chinese website!
![[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyright](https://img.php.cn/upload/article/001/242/473/174707263295098.jpg?x-oss-process=image/resize,p_40)
The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.

Have you heard of a framework called the "Fukatsu Prompt System"? Language models such as ChatGPT are extremely excellent, but appropriate prompts are essential to maximize their potential. Fukatsu prompts are one of the most popular prompt techniques designed to improve output accuracy. This article explains the principles and characteristics of Fukatsu-style prompts, including specific usage methods and examples. Furthermore, we have introduced other well-known prompt templates and useful techniques for prompt design, so based on these, we will introduce C.

ChatGPT Search: Get the latest information efficiently with an innovative AI search engine! In this article, we will thoroughly explain the new ChatGPT feature "ChatGPT Search," provided by OpenAI. Let's take a closer look at the features, usage, and how this tool can help you improve your information collection efficiency with reliable answers based on real-time web information and intuitive ease of use. ChatGPT Search provides a conversational interactive search experience that answers user questions in a comfortable, hidden environment that hides advertisements

In a modern society with information explosion, it is not easy to create compelling articles. How to use creativity to write articles that attract readers within a limited time and energy requires superb skills and rich experience. At this time, as a revolutionary writing aid, ChatGPT attracted much attention. ChatGPT uses huge data to train language generation models to generate natural, smooth and refined articles. This article will introduce how to effectively use ChatGPT and efficiently create high-quality articles. We will gradually explain the writing process of using ChatGPT, and combine specific cases to elaborate on its advantages and disadvantages, applicable scenarios, and safe use precautions. ChatGPT will be a writer to overcome various obstacles,

An efficient guide to creating charts using AI Visual materials are essential to effectively conveying information, but creating it takes a lot of time and effort. However, the chart creation process is changing dramatically due to the rise of AI technologies such as ChatGPT and DALL-E 3. This article provides detailed explanations on efficient and attractive diagram creation methods using these cutting-edge tools. It covers everything from ideas to completion, and includes a wealth of information useful for creating diagrams, from specific steps, tips, plugins and APIs that can be used, and how to use the image generation AI "DALL-E 3."

Unlock ChatGPT Plus: Fees, Payment Methods and Upgrade Guide ChatGPT, a world-renowned generative AI, has been widely used in daily life and business fields. Although ChatGPT is basically free, the paid version of ChatGPT Plus provides a variety of value-added services, such as plug-ins, image recognition, etc., which significantly improves work efficiency. This article will explain in detail the charging standards, payment methods and upgrade processes of ChatGPT Plus. For details of OpenAI's latest image generation technology "GPT-4o image generation" please click: Detailed explanation of GPT-4o image generation: usage methods, prompt word examples, commercial applications and differences from other AIs Table of contents ChatGPT Plus Fees Ch

How to use ChatGPT to streamline your design work and increase creativity This article will explain in detail how to create a design using ChatGPT. We will introduce examples of using ChatGPT in various design fields, such as ideas, text generation, and web design. We will also introduce points that will help you improve the efficiency and quality of a variety of creative work, such as graphic design, illustration, and logo design. Please take a look at how AI can greatly expand your design possibilities. table of contents ChatGPT: A powerful tool for design creation


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.
