


The Open LLM list has been refreshed again, and a 'Platypus' stronger than Llama 2 is here.
In order to challenge the dominance of closed models such as OpenAI’s GPT-3.5 and GPT-4, a series of open source models are emerging, including LLaMa, Falcon, etc. Recently, Meta AI launched LLaMa-2, which is known as the most powerful model in the open source field, and many researchers have also built their own models on this basis. For example, StabilityAI used Orca-style data sets to fine-tune the Llama2 70B model and developed StableBeluga2, which also achieved good results on Huggingface's Open LLM rankings
The latest Open The LLM list ranking has changed, and the Platypus (Platypus) model has successfully climbed to the top of the list
The author is from Boston University and uses PEFT and LoRA And the dataset Open-Platypus fine-tuned and optimized Platypus based on Llama 2
The author introduced Platypus in detail in a paper
The paper can be found at: https://arxiv.org/abs/2308.07317
The following are the main contributions of this article:
- Open-Platypus is a small-scale dataset consisting of a curated subset of public text datasets . This dataset consists of 11 open source datasets with a focus on improving LLM’s STEM and logic knowledge. It consists mainly of questions designed by humans, with only 10% of questions generated by LLM. The main advantage of Open-Platypus is its scale and quality, which enables very high performance in a short time and with low time and cost of fine-tuning. Specifically, training a 13B model using 25k problems takes just 5 hours on a single A100 GPU.
- Describes the similarity elimination process, reduces the size of the dataset, and reduces data redundancy.
- The ever-present phenomenon of contamination of open LLM training sets with data contained in important LLM test sets is analyzed in detail, and the author's training data filtering process to avoid this hidden danger is introduced.
- Describes the process of selecting and merging specialized fine-tuned LoRA modules.
Open-Platypus Dataset
The author has currently released the Open-Platypus Dataset on Hugging Face
Contamination problem
To avoid benchmarking problems leaking into the training set, This approach first considers preventing this problem to ensure that the results are not simply biased by memory. While striving for accuracy, the authors are also aware of the need for flexibility in marking please say again questions because questions can be asked in a variety of ways and are influenced by general domain knowledge. To manage potential leakage issues, the authors carefully designed heuristics for manually filtering problems with more than 80% similarity to the cosine embedding of the benchmark problem in Open-Platypus. They divided potential leak issues into three categories: (1) Please say the question again; (2) Rephrase: This area presents a gray toned problem; (3) similar but not identical problem. To be cautious, they excluded all of these problems from the training set
Please say it again
This text almost exactly replicates the content of the test question set, with only slight modifications or rearrangements of the words. Based on the number of leaks in the table above, the authors believe this is the only category that falls under contamination. The following are specific examples:
Redescription: This area has a gray tint
The following issues are called redescriptions: This area takes on a shade of gray and includes issues that are not exactly, please, common sense. While the authors leave the final judgment on these issues to the open source community, they argue that these issues often require expert knowledge. It should be noted that this type of questions includes questions with exactly the same instructions but synonymous answers:
Similar but not identical
These questions have a high degree of similarity, but due to subtle changes between the questions, there are significant differences in the answers.
Fine-tuning and merging
After the data set is improved, the author focuses on two methods: low Rank approximation (LoRA) training and parameter efficient fine-tuning (PEFT) library. Unlike full fine-tuning, LoRA retains the weights of the pre-trained model and uses the rank decomposition matrix for integration in the transformer layer, thereby reducing trainable parameters and saving training time and cost. Initially, fine-tuning mainly focused on attention modules such as v_proj, q_proj, k_proj and o_proj. Subsequently, it was extended to the gate_proj, down_proj and up_proj modules according to the suggestions of He et al. Unless the trainable parameters are less than 0.1% of the total parameters, these modules all show better results. The author adopted this method for both the 13B and 70B models, and the result was that the trainable parameters were 0.27% and 0.2% respectively. The only difference is the initial learning rate of these models
The results
According to the Hugging Face Open LLM ranking data on August 10, 2023, The author compared Platypus with other SOTA models and found that the Platypus2-70Binstruct variant performed well, ranking first with an average score of 73.13
Stable -Platypus2-13B model stands out with an average score of 63.96 among 13 billion parameter models, which deserves attention
##Limitations
Platypus, as a fine-tuned extension of LLaMa-2, retains many of the constraints of the base model and introduces specific challenges through targeted training. It shares the static knowledge base of LLaMa-2, which may become outdated . Additionally, there is a risk of generating inaccurate or inappropriate content, particularly in cases of unclear prompts. While Platypus has been enhanced in STEM and English logic, its proficiency in other languages is not reliable and may be inconsistent. It occasionally produces biased or inconsistent harmful content. The author acknowledges efforts to minimize these issues but acknowledges the ongoing challenges, particularly in non-English languages.
The potential for abuse of Platypus is a concern. issues, so developers should conduct security testing of their applications before deployment. Platypus may have some limitations outside of its primary domain, so users should proceed with caution and consider additional fine-tuning for optimal performance. Users need to ensure that the training data for Platypus does not overlap with other benchmark test sets. The authors are very cautious about data contamination issues and avoid merging models with models trained on tainted datasets. Although it is confirmed that there is no contamination in the cleaned training data, it cannot be ruled out that some problems may have been overlooked. For details on these limitations, see the Limitations section in the paper
The above is the detailed content of The Open LLM list has been refreshed again, and a 'Platypus' stronger than Llama 2 is here.. For more information, please follow other related articles on the PHP Chinese website!

Exploring the Inner Workings of Language Models with Gemma Scope Understanding the complexities of AI language models is a significant challenge. Google's release of Gemma Scope, a comprehensive toolkit, offers researchers a powerful way to delve in

Unlocking Business Success: A Guide to Becoming a Business Intelligence Analyst Imagine transforming raw data into actionable insights that drive organizational growth. This is the power of a Business Intelligence (BI) Analyst – a crucial role in gu

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Introduction Imagine a bustling office where two professionals collaborate on a critical project. The business analyst focuses on the company's objectives, identifying areas for improvement, and ensuring strategic alignment with market trends. Simu

Excel data counting and analysis: detailed explanation of COUNT and COUNTA functions Accurate data counting and analysis are critical in Excel, especially when working with large data sets. Excel provides a variety of functions to achieve this, with the COUNT and COUNTA functions being key tools for counting the number of cells under different conditions. Although both functions are used to count cells, their design targets are targeted at different data types. Let's dig into the specific details of COUNT and COUNTA functions, highlight their unique features and differences, and learn how to apply them in data analysis. Overview of key points Understand COUNT and COU

Google Chrome's AI Revolution: A Personalized and Efficient Browsing Experience Artificial Intelligence (AI) is rapidly transforming our daily lives, and Google Chrome is leading the charge in the web browsing arena. This article explores the exciti

Reimagining Impact: The Quadruple Bottom Line For too long, the conversation has been dominated by a narrow view of AI’s impact, primarily focused on the bottom line of profit. However, a more holistic approach recognizes the interconnectedness of bu

Things are moving steadily towards that point. The investment pouring into quantum service providers and startups shows that industry understands its significance. And a growing number of real-world use cases are emerging to demonstrate its value out


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.