


The multi-modal model of the National People's Congress moves towards AGI: it realizes independent updating for the first time, and photo video generation surpasses Sora
On the one hand, people expect embodied intelligence to be adaptable, that is, the agent can adapt to changing application environments through continuous learning, and can perform tasks in known multi-modal tasks. It gets better and better, and can quickly adapt to unknown multi-modal tasks.
On the other hand, people also expect embodied intelligence to be truly creative, hoping that it can discover new strategies and solutions through autonomous exploration of the environment, and explore The boundaries of artificial intelligence capabilities. By using multimodal large models as the “brains” of embodied intelligence, we have the potential to dramatically increase the adaptability and creativity of embodied intelligence, ultimately approaching the threshold of AGI (or even achieving AGI).
However, existing large multi-modal models have two obvious problems: First, the model has a long iterative update cycle, which requires a lot of human and financial investment; Second, the training data of the model are all derived from existing data, and the model cannot continuously obtain a large amount of new knowledge. Although continuous new knowledge can also be injected through RAG and long context, the multi-modal large model itself does not learn these new knowledge, and these two remediation methods will also bring additional problems.
In short, the current large multi-modal models are not very adaptable in actual application scenarios, let alone creative, resulting in always failing when implemented in the industry. Various difficulties arise.
Awaker 1.0 released by Sophon Engine this time is the world's first multi-modal large model with an autonomous update mechanism, which can be used as the "brain" of embodied intelligence . The autonomous update mechanism of Awaker 1.0 includes three key technologies: active data generation, model reflection and evaluation, and continuous model update.
Different from all other large multi-modal models, Awaker 1.0 is "live" and its parameters can be continuously updated in real time.
As can be seen from the frame diagram above, Awaker 1.0 can be combined with various smart devices, observe the world through smart devices, generate action intentions, and automatically build command control Smart devices complete various actions. Smart devices will automatically generate various feedbacks after completing various actions. Awaker 1.0 can obtain effective training data from these actions and feedbacks for continuous self-updating, and continuously strengthen the various capabilities of the model.
Taking the injection of new knowledge as an example, Awaker 1.0 can continuously learn the latest news information on the Internet and answer various complex questions based on the newly learned news information. Different from the traditional methods of RAG and long context, Awaker 1.0 can truly learn new knowledge and "memorize" the parameters of the model.
- Applying Transformer technology to diffusion-based video generation demonstrates the great potential of Transformer in the field of video generation. The advantage of VDT is its excellent time-dependent capture capability, enabling the generation of temporally coherent video frames, including simulating the physical dynamics of three-dimensional objects over time.
- Proposes a unified spatio-temporal mask modeling mechanism, enabling VDT to handle a variety of video generation tasks, realizing the wide application of this technology. VDT's flexible conditional information processing methods, such as simple token space splicing, effectively unify information of different lengths and modalities. At the same time, by combining with the spatiotemporal mask modeling mechanism, VDT has become a universal video diffusion tool, which can be applied to unconditional generation, video subsequent frame prediction, frame interpolation, picture-generating videos, and video frames without modifying the model structure. Completion and other video generation tasks.
The above is the detailed content of The multi-modal model of the National People's Congress moves towards AGI: it realizes independent updating for the first time, and photo video generation surpasses Sora. For more information, please follow other related articles on the PHP Chinese website!

The NVIDIA AI Summit 2024: A Deep Dive into India's AI Revolution Following the Datahack Summit 2024, India gears up for the NVIDIA AI Summit 2024, scheduled for October 23rd-25th at the Jio World Convention Centre in Mumbai. This pivotal event prom

Introduction Imagine a fast, simple database engine—no configuration needed—that integrates directly into your applications and offers robust SQL support without a server. That's SQLite, widely used in applications and web browsers for its ease of u

Get Roasted by an AI! A Hilarious Dive into Wordware AI YouTube roast videos are hugely popular, but have you ever been roasted by artificial intelligence? I recently experienced the comedic wrath of Wordware AI, and it was a hilariously humbling ex

Introduction Efficient software development hinges on a strong understanding of algorithms and data structures. Python, known for its ease of use, provides built-in data structures like lists, dictionaries, and sets. However, the true power is unlea

Violin Plots: A Powerful Data Visualization Tool This article delves into violin plots, a compelling data visualization technique merging box plots and density plots. We'll explore how these plots unveil data patterns, making them invaluable for dat

Advanced Python for Data Scientists: Mastering Classes, Generators, and More This article delves into advanced Python concepts crucial for data scientists, building upon the foundational knowledge of Python's built-in data structures. We'll explore

SQL Query Interpretation Guide: From Beginner to Mastery Imagine you are solving a puzzle where every SQL query is part of the image, and you are trying to get the complete picture from it. This guide will introduce some practical methods to teach you how to read and write SQL queries. Whether you look at SQL from a beginner's perspective or from a professional programmer's perspective, interpreting SQL queries will help you get answers faster and easier. Start exploring and you will soon realize how SQL usage revolutionizes the way you think about databases. Overview Master the basic structure of SQL query. Interpret various SQL clauses and functions. Analyze and understand complex SQL queries. Efficient debugging and excellent

A Groundbreaking Paper on Dataset Diversity in Machine Learning The machine learning (ML) community is abuzz over a recent ICML 2024 Best Paper Award winner that challenges the often-unsubstantiated claims of "diversity" in datasets. Resea


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Chinese version
Chinese version, very easy to use

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

Zend Studio 13.0.1
Powerful PHP integrated development environment

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download
The most popular open source editor