


China's first self-developed MoE multi-modal large model reveals Tencent's mixed-element multi-modal understanding

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com
Method introduction: MoE architecture
Tencent’s large mixed language model is the first in China to adopt the mixed expert model (MoE) architecture. The overall performance of the model is 50% higher than the previous generation, and some Chinese capabilities It has tied up with GPT-4o, and has greatly improved its performance in answering "current" questions, as well as in mathematics, reasoning and other abilities. As early as the beginning of this year, Tencent Hunyuan applied this model to Tencent Yuanbao.
Simple and large -scale
- Using a simple MLP adapter: Compared with the previous mainstream Q-former adapter, the MLP adapter has less loss during information transmission.
SuperClue-V ranks first in the domestic list
In this evaluation, the Hunyuan multi-modal understanding system hunyuan-vision achieved a score of 71.95, second only to GPT-4o. In terms of multi-modal applications, hunyuan-vision is ahead of Claude3.5-Sonnet and Gemini-1.5-Pro.
Tencent Hunyuan Graphics and Text Large Model shows good performance in multiple dimensions such as general scenes, image OCR recognition and understanding, and Chinese element understanding and reasoning, and also reflects the potential of the model in future applications.
Aimed at general application scenarios
Here are more typical examples:
Explain a piece of code:
Analyze a bill:
Description Picture content:
Do math problems:
Analyze based on the content of the picture:
Help you write copy:
現在、Tencent の Hunyuan マルチモーダル理解大規模モデルは、AI アシスタント製品である Tencent Yuanbao でリリースされており、Tencent Cloud を通じて企業と個人の開発者に公開されています。
テンセント元宝アドレス: https://yuanbao.tencent.com/chat
The above is the detailed content of China's first self-developed MoE multi-modal large model reveals Tencent's mixed-element multi-modal understanding. For more information, please follow other related articles on the PHP Chinese website!

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

The TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Atom editor mac version download
The most popular open source editor

SublimeText3 Linux new version
SublimeText3 Linux latest version

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),