search
HomeTechnology peripheralsAIWithin hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

Last week, Microsoft airborne an open source model called WizardLM-2 that can be called GPT-4 level.

But I didn’t expect that it would be deleted immediately a few hours after it was posted.

Some netizens suddenly discovered that WizardLM’s model weights and announcement posts had all been deleted and were no longer in the Microsoft collection. Apart from mentioning the site, they could not find any evidence to prove this. An official project from Microsoft.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

The GitHub project homepage has become a 404.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

Project address: https://wizardlm.github.io/

Include the model in HF All the weights on it have also disappeared...

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

The whole network is full of confusion, why is WizardLM gone?

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

However, the reason Microsoft did this was because the team forgot to "test" the model.

Later, the Microsoft team appeared to apologize and explained that it had been a while since WizardLM was released a few months ago, so we were not familiar with the new release process now.

We accidentally left out an item that is required in the model release process: poisoning testing

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

Microsoft WizardLM upgrade to second generation

In June last year, the first-generation WizardLM, which was fine-tuned based on LlaMA, was released and attracted a lot of attention from the open source community.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

Paper address: https://arxiv.org/pdf/2304.12244.pdf

Follow , the code version of WizardCoder was born - a model based on Code Llama and fine-tuned using Evol-Instruct.

The test results show that WizardCoder’s pass@1 on HumanEval reached an astonishing 73.2%, surpassing the original GPT-4.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

As the time progresses to April 15, Microsoft developers officially announced a new generation of WizardLM, this time fine-tuned from Mixtral 8x22B.

It contains three parameter versions, namely 8x22B, 70B and 7B.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

The most noteworthy thing is that in the MT-Bench benchmark test, the new model achieved a leading advantage.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

Specifically, the performance of the largest parameter version of the WizardLM 8x22B model is almost close to GPT-4 and Claude 3.

Under the same parameter scale, the 70B version ranks first.

The 7B version is the fastest and can even achieve performance comparable to the leading model with a parameter scale 10 times larger.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

The secret behind WizardLM 2’s outstanding performance lies in Evol-Instruct, a revolutionary training methodology developed by Microsoft.

Evol-Instruct utilizes large language models to iteratively rewrite the initial instruction set into increasingly complex variants. These evolved instruction data are then used to fine-tune the base model, significantly improving its ability to handle complex tasks.

The other is the reinforcement learning framework RLEIF, which also played an important role in the development process of WizardLM 2.

In WizardLM 2 training, the AI ​​Align AI (AAA) method is also used, which allows multiple leading large models to guide and improve each other.

The AAA framework consists of two main components, namely "co-teaching" and "self-study".

Co-teaching In this phase, WizardLM works with a variety of licensed open source and proprietary advanced models to conduct simulation chats, quality critiques, suggestions for improvements, and closing skill gaps.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

By communicating with each other and providing feedback, models learn from their peers and improve their capabilities.

For self-study, WizardLM can generate new evolutionary training data for supervised learning and preference data for reinforcement learning through active self-study.

This self-learning mechanism allows the model to continuously improve performance by learning from the data and feedback information it generates.

In addition, the WizardLM 2 model was trained using the generated synthetic data.

In the view of researchers, training data for large models is increasingly depleted, and it is believed that data carefully created by AI and models gradually supervised by AI will be the only way to more powerful artificial intelligence.

So they created a fully AI-driven synthetic training system to improve WizardLM-2.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

Fast netizens have already downloaded the weight

However, before the database was deleted, Many people have downloaded the model weights.

Several users also tested the model on some additional benchmarks before it was removed.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

Fortunately, the netizens who tested it were impressed by the 7B model and said that it would be their first choice for performing local assistant tasks.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

Someone also conducted a poisoning test and found that the WizardLM-8x22B scored 98.33, while the base Mixtral-8x22B scored 89.46. Mixtral 8x7B-Indict scored 92.93.

The higher the score, the better, which means that WizardLM-8x22B is still very strong.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

If there is no poisoning test, it is absolutely impossible to send the model out.

It is well known that large models are prone to hallucinations.

If WizardLM 2 outputs "toxic, biased, and incorrect" content in the answer, it is not friendly to large models.

In particular, these mistakes attract the attention of the entire network, and Microsoft itself will also fall into criticism and even be investigated by the authorities.

Some netizens wondered, you can update the indicators through "poisoning test". Why delete the entire repository and weight?

The Microsoft author stated that according to the latest internal regulations, this can only be done.

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

Some people also said that we want models without "lobotomy".

Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test

#However, developers still need to wait patiently, and the Microsoft team promises that it will go back online after the test is completed.

The above is the detailed content of Within hours of release, Microsoft deleted a large open source model comparable to GPT-4 in seconds! Forgot to take the poison test. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
What is Graph of Thought in Prompt EngineeringWhat is Graph of Thought in Prompt EngineeringApr 13, 2025 am 11:53 AM

Introduction In prompt engineering, “Graph of Thought” refers to a novel approach that uses graph theory to structure and guide AI’s reasoning process. Unlike traditional methods, which often involve linear s

Optimize Your Organisation's Email Marketing with GenAI AgentsOptimize Your Organisation's Email Marketing with GenAI AgentsApr 13, 2025 am 11:44 AM

Introduction Congratulations! You run a successful business. Through your web pages, social media campaigns, webinars, conferences, free resources, and other sources, you collect 5000 email IDs daily. The next obvious step is

Real-Time App Performance Monitoring with Apache PinotReal-Time App Performance Monitoring with Apache PinotApr 13, 2025 am 11:40 AM

Introduction In today’s fast-paced software development environment, ensuring optimal application performance is crucial. Monitoring real-time metrics such as response times, error rates, and resource utilization can help main

ChatGPT Hits 1 Billion Users? 'Doubled In Just Weeks' Says OpenAI CEOChatGPT Hits 1 Billion Users? 'Doubled In Just Weeks' Says OpenAI CEOApr 13, 2025 am 11:23 AM

“How many users do you have?” he prodded. “I think the last time we said was 500 million weekly actives, and it is growing very rapidly,” replied Altman. “You told me that it like doubled in just a few weeks,” Anderson continued. “I said that priv

Pixtral-12B: Mistral AI's First Multimodal Model - Analytics VidhyaPixtral-12B: Mistral AI's First Multimodal Model - Analytics VidhyaApr 13, 2025 am 11:20 AM

Introduction Mistral has released its very first multimodal model, namely the Pixtral-12B-2409. This model is built upon Mistral’s 12 Billion parameter, Nemo 12B. What sets this model apart? It can now take both images and tex

Agentic Frameworks for Generative AI Applications - Analytics VidhyaAgentic Frameworks for Generative AI Applications - Analytics VidhyaApr 13, 2025 am 11:13 AM

Imagine having an AI-powered assistant that not only responds to your queries but also autonomously gathers information, executes tasks, and even handles multiple types of data—text, images, and code. Sounds futuristic? In this a

Applications of Generative AI in the Financial SectorApplications of Generative AI in the Financial SectorApr 13, 2025 am 11:12 AM

Introduction The finance industry is the cornerstone of any country’s development, as it drives economic growth by facilitating efficient transactions and credit availability. The ease with which transactions occur and credit

Guide to Online Learning and Passive-Aggressive AlgorithmsGuide to Online Learning and Passive-Aggressive AlgorithmsApr 13, 2025 am 11:09 AM

Introduction Data is being generated at an unprecedented rate from sources such as social media, financial transactions, and e-commerce platforms. Handling this continuous stream of information is a challenge, but it offers an

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use