Stanford's 2 billion parameter end-test multi-modal AI Agent model has been greatly upgraded, and can be used by mobile phones, cars and robots-AI-php.cn

Stanford's 2 billion parameter end-test multi-modal AI Agent model has been greatly upgraded, and can be used by mobile phones, cars and robots

王林

May 07, 2024 pm 04:25 PM

aitrain

The world’s first ultra-small multi-modal AI Agent modelOctopus V3, from the NEXA AI team of Stanford University, making Agent smarter, faster, and reducing energy consumption and costs.

斯坦福20亿参数端测多模态AI Agent模型大升级，手机汽车机器人都能用

In early April this year, NEXA AI launched the much-anticipated Octopus V2, which surpassed GPT in function call performance -4, reduces the amount of text required for inference by 95%, bringing new possibilities to end-side AI applications. Its patented core technology "functional token" significantly reduces the length of text required for reasoning through innovative function calling methods.

This approach enables efficient training of models with only 2 billion parameters and surpasses in accuracy and latency GPT-4 adapts to the deployment needs of various end devices.

Since Octopus V2 was released in the LLM community, it has received widespread attention and attracted praise from a large number of experts and researchers in the field of artificial intelligence, such as Julien Chaumond, CTO of Hugging Face, and Rowan, founder of the well-known AI newsletter AI Cheung, as well as Figure AI founder Brett Adcock, OPPO edge artificial intelligence team leader Manoj Kumar, etc. They are hailed as "creating a new era of device-side AI technology."

On the well-known open source AI platform Hugging Face, Octopus V2 has been downloaded more than 12,000 times.

斯坦福20亿参数端测多模态AI Agent模型大升级，手机汽车机器人都能用

In less than a month, the NEXA AI team released the next-generation multi-modal AI Agent model Octopus V3, demonstrating further breakthroughs: with Image processing and multi-language text processing capabilities pave the way for end-side devices such as smartphones to truly enter the AI era.

斯坦福20亿参数端测多模态AI Agent模型大升级，手机汽车机器人都能用

Octopus V3 not only has multi-modal capabilities, The function calling performance far exceeds similar models and is comparable to GPT-4V GPT4; while the number of model parameters does not reach 1 billion, and it has multi-language capabilities.

In other words, compared with traditional large-scale language models, it is smaller in size and consumes less energy. It can more easily run on various small-end devices, such as Raspberry Pi, and achieve high speed. and accurate function calls.

This means that in the future, AI Agent can be widely used in smartphones, AR/VR, robots, smart cars and other end-side devices to provide users with a more interactive experience. Smooth and smart.

On the other hand, because V3 has multi-modal processing capabilities, it can handle text and image input at the same time, coupled with multi-language capabilities, it will also make the user experience richer.

For example, in the Instacart shopping application, users can let the AI Agent automatically search for products for them through a picture of a pineapple and simple conversation instructions, improving efficiency and user experience.

斯坦福20亿参数端测多模态AI Agent模型大升级，手机汽车机器人都能用

For another example, in scenarios such as sending emails, Octopus V3 can automatically extract information and fill in the email content based on an image with text, providing users with more intelligent, Convenient service.

斯坦福20亿参数端测多模态AI Agent模型大升级，手机汽车机器人都能用

From software interaction to smart cars, device-side AI has huge potential

Based on these characteristics, Octopus V2 and V3 have rich and diverse application scenarios and a wide range of applications. Application prospects.

In addition to the mobile phone scenarios mentioned above, when Octopus V2 is applied to smart cars, it can also bring new interactive experiences. Current voice assistants are often difficult to help car owners complete more complex tasks, such as temporarily changing destinations during driving, adding additional stops, etc. After applying Octopus V3, the AI assistant can quickly and accurately complete corresponding tasks based on relatively vague and simple instructions.

Combined with the capabilities of V2 and V3, from information retrieval to completion of design based on instructions, users can obtain a smooth AI experience in virtual scenes: In a community user’s VR scene demo, input simple voice commands Finally, AI Agent can help users quickly complete a living room design, replace sofas, change the color of lights, etc. with just a few clicks. After the user enters the travel instructions, the user quickly arrives in Japan, and the AI Agent can also help the user search for corresponding attractions and provide rich information in simple conversational communication.

Data shows that the global large-scale language model market is growing rapidly. Granview Research reports that the global large language model market size is estimated at US$4.35 billion and is expected to grow at a compound annual growth rate of 35.9% from 2024 to 2030. Similarly, the edge artificial intelligence market is also showing a booming momentum - it is expected that the global edge artificial intelligence market will grow at a compound annual growth rate of 21.0% from 2023 to 2030, and will reach US$66.478 billion by 2030.

The NEXA AI team was founded by outstanding researchers at Stanford University.

Founder and Chief Scientist Alex Chen (Chen Wei) is studying for a PhD at Stanford University. He has extensive experience in artificial intelligence research and has served as a Chinese researcher at Stanford University. Chairman of the Stanford Chinese Entrepreneurs Organization.

Co-founder and Chief Technology Officer Zack Li (Li Zhiyuan) is also a graduate of Stanford University and has 4 years of end-side experience in Google and Amazon Lab126 laboratories With front-line research and development experience in AI, he also served as the chairman of the Stanford Chinese Entrepreneurship Association.

Associate Professor at Stanford University and Deputy Director of the Stanford Technology Entrepreneurship ProgramCharles (Chuck) Eesley serves as an advisor, providing guidance and support to the team.

斯坦福20亿参数端测多模态AI Agent模型大升级，手机汽车机器人都能用 △Left: Li Zhiyuan; Right: Chen Wei

Currently, NEXA AI’s original technology has applied for patent protection.

The founding team of NEXA AI stated that they will continue to be committed to promoting the development of end-side AI technology, increasing the influence of its innovative technologies through open source models, and creating a smarter and more efficient future life for users.

Paper address: https://arxiv.org/abs/2404.11459

The above is the detailed content of Stanford's 2 billion parameter end-test multi-modal AI Agent model has been greatly upgraded, and can be used by mobile phones, cars and robots. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Are You At Risk Of AI Agency Decay? Take The Test To Find OutApr 21, 2025 am 11:31 AM

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

How to Build an AI Agent from Scratch? - Analytics VidhyaApr 21, 2025 am 11:30 AM

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

Revisiting The Humanities In The Age Of AIApr 21, 2025 am 11:28 AM

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

Understanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

Insights on spaCy, Prodigy and Generative AI from Ines MontaniApr 21, 2025 am 11:01 AM

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

A Guide to Building Agentic RAG Systems with LangGraphApr 21, 2025 am 11:00 AM

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.