search
HomeTechnology peripheralsAIMobile phone runs Microsoft's small model better than large model with 2.7 billion parameters

Microsoft CEO Nadella announced at the Ignite conference last month that the Phi-2 small-scale model will be fully open source. This move will significantly improve the performance of common sense reasoning, language understanding and logical reasoning

Mobile phone runs Microsofts small model better than large model with 2.7 billion parameters

Today, Microsoft announced more of the Phi-2 model Details and the new prompting technology promptbase. This model with only 2.7 billion parameters outperforms Llama2 7B, Llama2 13B, Mistral 7B, and closes the gap (or even better) with Llama2 70B on most common sense reasoning, language understanding, mathematics, and coding tasks.

At the same time, the small-sized Phi-2 can run on mobile devices such as laptops and mobile phones. Nadella said that Microsoft is very happy to share its best-in-class small language model (SLM) and SOTA prompt technology with R&D developers.

Mobile phone runs Microsofts small model better than large model with 2.7 billion parameters

Microsoft published a paper called "Textbook Only" in June this year, using a "textbook" containing only 7B tags Quality" data to train a model containing 1.3B parameters, namely phi-1. Despite having datasets and model sizes that are orders of magnitude smaller than competitors, phi-1 achieves a first-time pass rate of 50.6% in HumanEval and an accuracy of 55.5% in MBPP. phi-1 proved that even high-quality "small data" can lead to good model performance

Microsoft subsequently published "Just a Textbook II: Phi-1.5" in September Technical Report", further research into the potential of high-quality "small data". The article proposes Phi-1.5, which is suitable for QA Q&A, coding and other scenarios, and can reach a scale of 1.3 billion

Nowadays, Phi-2 with 2.7 billion parameters once again uses "small body ” gives excellent reasoning and language understanding capabilities, demonstrating SOTA performance in basic language models below 13 billion parameters. Thanks to innovations in model scaling and training data management, Phi-2 matches or exceeds models 25 times its size on complex benchmarks.

Microsoft says Phi-2 will be an ideal model for researchers to conduct interpretability exploration, security improvements, or fine-tuning experiments for a variety of tasks. Microsoft has made Phi-2 available in the Azure AI Studio model catalog to facilitate language model development.

Phi-2 Key Highlights

The scale of the language model has increased to hundreds of billions of parameters, which has indeed released many new capabilities and redefined nature. Landscapes of language processing. But a question remains: can these new capabilities also be achieved on smaller scale models through training strategy selection (such as data selection)?

The solution provided by Microsoft is to use the Phi series of models to achieve similar performance to large models by training small language models. Phi-2 breaks the scaling rules of traditional language models in two aspects

First, the quality of training data plays a crucial role in model performance. Microsoft takes this understanding to the extreme by focusing on "textbook quality" data. Their training data consists of a specially created comprehensive dataset that teaches the model common sense knowledge and reasoning, such as science, daily activities, and psychology. In addition, they further expand their training corpus with carefully selected web data that is screened for educational value and content quality

Secondly, Microsoft uses innovative technologies to expand, Starting from Phi-1.5 with 1.3 billion parameters, knowledge was gradually embedded into Phi-2 with 2.7 billion parameters. This scaled knowledge transfer accelerates training convergence and significantly improves Phi-2’s benchmark scores.

The following is the comparison graph between Phi-2 and Phi-1.5 for all other tasks except BBH (3-shot CoT) and MMLU (5-shot) Evaluation using 0-shot

Mobile phone runs Microsofts small model better than large model with 2.7 billion parameters

Training details

Phi-2 is a Transformer-based model , whose goal is to predict the next word. It was trained on synthetic and network datasets, using 96 A100 GPUs, and took 14 days

Phi-2 is a base model and failed Reinforcement learning with human feedback (RLHF) performs alignment and does not perform instruction fine-tuning. Despite this, Phi-2 still performed better in terms of toxicity and bias compared to the tuned existing open source model, as shown in Figure 3 below.

Mobile phone runs Microsofts small model better than large model with 2.7 billion parameters

Experimental Evaluation

First, the study experimentally compared Phi-2 with common language models on academic benchmarks, Covers multiple categories including:

  • Big Bench Hard (BBH) (3 shots with CoT)
  • Common Sense Reasoning (PIQA) , WinoGrande, ARC easy and challenge, SIQA),
  • Language understanding (HellaSwag, OpenBookQA, MMLU (5-shot), SQuADv2 (2-shot), BoolQ)
  • Mathematics (GSM8k (8 shot))
  • Encoding (HumanEval, MBPP (3-shot))

The Phi-2 model only has 2.7 billion parameters, but its performance surpasses the 7B and 13B Mistral models and the Llama2 model on various aggregation benchmarks. It is worth mentioning that Phi-2 performs better in multi-step inference tasks (i.e. coding and mathematics) compared to the massive 25x Llama2-70B model

In addition, Despite its smaller size, Phi-2's performance is comparable to the recently released Gemini Nano 2

Since many public benchmarks may leak into the training data, the research team believes that the test language The best way to measure model performance is to test it on specific use cases. Therefore, the study evaluated Phi-2 using multiple internal Microsoft proprietary datasets and tasks and again compared it with Mistral and Llama-2. On average, Phi-2 outperformed Mistral-7B and Mistral -7B outperforms the Llama2 model (7B, 13B, 70B).

Mobile phone runs Microsofts small model better than large model with 2.7 billion parameters


Mobile phone runs Microsofts small model better than large model with 2.7 billion parameters

## The research team also conducted a survey on common research community tips Extensively tested. Phi-2 performed as expected. For example, for a prompt used to evaluate a model's ability to solve physics problems (recently used to evaluate the Gemini Ultra model), Phi-2 gave the following results:

Mobile phone runs Microsofts small model better than large model with 2.7 billion parameters

Mobile phone runs Microsofts small model better than large model with 2.7 billion parameters

The above is the detailed content of Mobile phone runs Microsoft's small model better than large model with 2.7 billion parameters. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Tesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserTesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserApr 22, 2025 am 11:48 AM

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Sam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailSam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailApr 22, 2025 am 11:29 AM

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's AI Omniverse Expands At GTC 2025Nvidia's AI Omniverse Expands At GTC 2025Apr 22, 2025 am 11:28 AM

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Exploring the Capabilities of Google's Gemma 2 ModelsExploring the Capabilities of Google's Gemma 2 ModelsApr 22, 2025 am 11:26 AM

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

The Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaThe Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaApr 22, 2025 am 11:21 AM

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

AI For Runners And Athletes: We're Making Excellent ProgressAI For Runners And Athletes: We're Making Excellent ProgressApr 22, 2025 am 11:12 AM

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Jamie Engstrom On Technology, Talent And Transformation At CaterpillarJamie Engstrom On Technology, Talent And Transformation At CaterpillarApr 22, 2025 am 11:10 AM

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

New Google Photos Update Makes Any Photo Pop With Ultra HDR QualityNew Google Photos Update Makes Any Photo Pop With Ultra HDR QualityApr 22, 2025 am 11:09 AM

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor