Home  >  Article  >  Technology peripherals  >  Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight

Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight

王林
王林forward
2024-04-07 16:19:011173browse

In the process of implementing large models, end-side AI is a very important direction.

Recently, Octopus v2 launched by researchers at Stanford University has become popular and has received great attention from the developer community. The model has been downloaded over 2k times overnight.

The 2 billion-parameter Octopus v2 can run on smartphones, cars, PCs, etc., surpassing GPT-4 in accuracy and latency, and reducing context length by 95%. Furthermore, Octopus v2 is 36 times faster than the Llama7B RAG scheme. Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight

Many netizens lamented: The era of device-side AI agents has arrived!

Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight

  • Paper: Octopus v2: On-device language model for super agent

  • Paper address: https ://arxiv.org/abs/2404.01744

  • Model homepage: https://huggingface.co/NexaAIDev/Octopus-v2

Model Overview

Octopus-V2-2B is an open source language model with 2 billion parameters, tailored for the Android API. It runs seamlessly on Android devices and extends its utility to a variety of applications ranging from Android system management to orchestration of multiple devices.

Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight

Typically, Retrieval Augmented Generation (RAG) methods require detailed descriptions of potential function parameters (sometimes requiring up to tens of thousands of input tokens). Based on this, Octopus-V2-2B introduces a unique function token strategy in the training and inference phases, which not only enables it to achieve a performance level comparable to GPT-4, but also significantly improves the inference speed, surpassing RAG-based methods. This makes it particularly beneficial for edge computing devices.

Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight

Octopus-V2-2B is capable of generating individual, nested and parallel function calls in a variety of complex scenarios.

Dataset

In order to adopt high-quality datasets for the training, validation and testing phases, and especially to achieve efficient training, the research team created the dataset with three key stages:

  • Generate relevant queries and their associated function call parameters;

  • Generate unrelated queries from the appropriate function components;

  • Binary verification support via Google Gemini.

Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight

The research team wrote 20 Android API descriptions for training the model. The following is an example of Android API description:

def get_trending_news (category=None, region='US', language='en', max_results=5):"""Fetches trending news articles based on category, region, and language.Parameters:- category (str, optional): News category to filter by, by default use None for all categories. Optional to provide.- region (str, optional): ISO 3166-1 alpha-2 country code for region-specific news, by default, uses 'US'. Optional to provide.- language (str, optional): ISO 639-1 language code for article language, by default uses 'en'. Optional to provide.- max_results (int, optional): Maximum number of articles to return, by default, uses 5. Optional to provide.Returns:- list [str]: A list of strings, each representing an article. Each string contains the article's heading and URL.    """

Model development and training

This research uses the Google Gemma-2B model as the pre-processor in the framework Train the model using two different training methods: full model training and LoRA model training.

In the complete model training, this study uses the AdamW optimizer, the learning rate is set to 5e-5, the number of warm-up steps is set to 10, and a linear learning rate scheduler is used.

LoRA model training uses the same optimizer and learning rate configuration as the full model training, LoRA rank is set to 16, and LoRA is applied to the following modules: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj. Among them, the LoRA alpha parameter is set to 32.

For both training methods, the number of epochs is set to 3.

Using the following code, you can run the Octopus-V2-2B model on a single GPU.

from transformers import AutoTokenizer, GemmaForCausalLMimport torchimport timedef inference (input_text):start_time = time.time ()input_ids = tokenizer (input_text, return_tensors="pt").to (model.device)input_length = input_ids ["input_ids"].shape [1]outputs = model.generate (input_ids=input_ids ["input_ids"], max_length=1024,do_sample=False)generated_sequence = outputs [:, input_length:].tolist ()res = tokenizer.decode (generated_sequence [0])end_time = time.time ()return {"output": res, "latency": end_time - start_time}model_id = "NexaAIDev/Octopus-v2"tokenizer = AutoTokenizer.from_pretrained (model_id)model = GemmaForCausalLM.from_pretrained (model_id, torch_dtype=torch.bfloat16, device_map="auto")input_text = "Take a selfie for me with front camera"nexa_query = f"Below is the query from the users, please call the correct function and generate the parameters to call the function.\n\nQuery: {input_text} \n\nResponse:"start_time = time.time () print ("nexa model result:\n", inference (nexa_query)) print ("latency:", time.time () - start_time,"s")

Evaluation

Octopus-V2-2B demonstrated superior inference speed in benchmark tests, outperforming "Llama7B" on a single A100 GPU RAG solution is 36 times faster. Additionally, Octopus-V2-2B is 168% faster compared to GPT-4-turbo, which relies on clustered A100/H100 GPUs. This efficiency breakthrough is attributed to the functional token design of Octopus-V2-2B.

Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight

Octopus-V2-2B not only performs well in speed, but also in accuracy, surpassing the "Llama7B RAG solution" in function call accuracy by 31%. Octopus-V2-2B achieves function calling accuracy comparable to GPT-4 and RAG GPT-3.5.

Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight

Interested readers can read the original text of the paper to learn more about the research content.

The above is the detailed content of Beyond GPT-4, the Stanford team’s large model that can be run on mobile phones became popular, with over 2k downloads overnight. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jiqizhixin.com. If there is any infringement, please contact admin@php.cn delete