


Beyond GPT-4, the Stanford team's large model that can be run on mobile phones became popular, with over 2k downloads overnight
In the process of implementing large models, end-side AI is a very important direction.
Recently, Octopus v2 launched by researchers at Stanford University has become popular and has received great attention from the developer community. The model has been downloaded over 2k times overnight.
The 2 billion-parameter Octopus v2 can run on smartphones, cars, PCs, etc., surpassing GPT-4 in accuracy and latency, and reducing context length by 95%. Furthermore, Octopus v2 is 36 times faster than the Llama7B RAG scheme.
Paper: Octopus v2: On-device language model for super agent
Paper address: https ://arxiv.org/abs/2404.01744
Model homepage: https://huggingface.co/NexaAIDev/Octopus-v2
Model Overview
Octopus-V2-2B is an open source language model with 2 billion parameters, tailored for the Android API. It runs seamlessly on Android devices and extends its utility to a variety of applications ranging from Android system management to orchestration of multiple devices.
Typically, Retrieval Augmented Generation (RAG) methods require detailed descriptions of potential function parameters (sometimes requiring up to tens of thousands of input tokens). Based on this, Octopus-V2-2B introduces a unique function token strategy in the training and inference phases, which not only enables it to achieve a performance level comparable to GPT-4, but also significantly improves the inference speed, surpassing RAG-based methods. This makes it particularly beneficial for edge computing devices.
Octopus-V2-2B is capable of generating individual, nested and parallel function calls in a variety of complex scenarios.
Dataset
In order to adopt high-quality datasets for the training, validation and testing phases, and especially to achieve efficient training, the research team created the dataset with three key stages:
Generate relevant queries and their associated function call parameters;
-
Generate unrelated queries from the appropriate function components;
Binary verification support via Google Gemini.
The research team wrote 20 Android API descriptions for training the model. The following is an example of Android API description:
def get_trending_news (category=None, region='US', language='en', max_results=5):"""Fetches trending news articles based on category, region, and language.Parameters:- category (str, optional): News category to filter by, by default use None for all categories. Optional to provide.- region (str, optional): ISO 3166-1 alpha-2 country code for region-specific news, by default, uses 'US'. Optional to provide.- language (str, optional): ISO 639-1 language code for article language, by default uses 'en'. Optional to provide.- max_results (int, optional): Maximum number of articles to return, by default, uses 5. Optional to provide.Returns:- list [str]: A list of strings, each representing an article. Each string contains the article's heading and URL. """
Model development and training
This research uses the Google Gemma-2B model as the pre-processor in the framework Train the model using two different training methods: full model training and LoRA model training.
In the complete model training, this study uses the AdamW optimizer, the learning rate is set to 5e-5, the number of warm-up steps is set to 10, and a linear learning rate scheduler is used.
LoRA model training uses the same optimizer and learning rate configuration as the full model training, LoRA rank is set to 16, and LoRA is applied to the following modules: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj. Among them, the LoRA alpha parameter is set to 32.
For both training methods, the number of epochs is set to 3.
Using the following code, you can run the Octopus-V2-2B model on a single GPU.
from transformers import AutoTokenizer, GemmaForCausalLMimport torchimport timedef inference (input_text):start_time = time.time ()input_ids = tokenizer (input_text, return_tensors="pt").to (model.device)input_length = input_ids ["input_ids"].shape [1]outputs = model.generate (input_ids=input_ids ["input_ids"], max_length=1024,do_sample=False)generated_sequence = outputs [:, input_length:].tolist ()res = tokenizer.decode (generated_sequence [0])end_time = time.time ()return {"output": res, "latency": end_time - start_time}model_id = "NexaAIDev/Octopus-v2"tokenizer = AutoTokenizer.from_pretrained (model_id)model = GemmaForCausalLM.from_pretrained (model_id, torch_dtype=torch.bfloat16, device_map="auto")input_text = "Take a selfie for me with front camera"nexa_query = f"Below is the query from the users, please call the correct function and generate the parameters to call the function.\n\nQuery: {input_text} \n\nResponse:"start_time = time.time () print ("nexa model result:\n", inference (nexa_query)) print ("latency:", time.time () - start_time,"s")
Evaluation
Octopus-V2-2B demonstrated superior inference speed in benchmark tests, outperforming "Llama7B" on a single A100 GPU RAG solution is 36 times faster. Additionally, Octopus-V2-2B is 168% faster compared to GPT-4-turbo, which relies on clustered A100/H100 GPUs. This efficiency breakthrough is attributed to the functional token design of Octopus-V2-2B.
Octopus-V2-2B not only performs well in speed, but also in accuracy, surpassing the "Llama7B RAG solution" in function call accuracy by 31%. Octopus-V2-2B achieves function calling accuracy comparable to GPT-4 and RAG GPT-3.5.
Interested readers can read the original text of the paper to learn more about the research content.
The above is the detailed content of Beyond GPT-4, the Stanford team's large model that can be run on mobile phones became popular, with over 2k downloads overnight. For more information, please follow other related articles on the PHP Chinese website!

https://undressaitool.ai/ is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.
![[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyright](https://img.php.cn/upload/article/001/242/473/174707263295098.jpg?x-oss-process=image/resize,p_40)
The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Dreamweaver Mac version
Visual web development tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version
Useful JavaScript development tools

Zend Studio 13.0.1
Powerful PHP integrated development environment
