搜尋
首頁科技週邊人工智慧谷歌Gemini1.5火速上線:MoE架構,100萬上下文

今天,Google宣布推出 Gemini 1.5。

Gemini 1.5是在Google基礎模型和基礎設施的研究與工程創新基礎上開發的。這個版本引入了新的專家混合(MoE)架構,以提高Gemini 1.5的訓練和服務的效率。

Google推出的是早期測試的Gemini 1.5的第一個版本,即Gemini 1.5 Pro。它是一種中型多模態模型,主要針對多種任務進行了擴展優化。與Google最大的模型1.0 Ultra相比,Gemini 1.5 Pro的性能水準相似,並引入了突破性的實驗特徵,能夠更好地理解長上下文。

Gemini 1.5 Pro的token上下文視窗數量為128,000個。然而,Google從今天開始,為少數開發人員和企業客戶提供了AI Studio和Vertex AI的私人預覽版,允許他們在最多1,000,000個token的上下文視窗中進行嘗試。此外,Google還進行了一些優化,旨在改善延遲、減少計算要求並提升用戶體驗。

Google CEO Sundar Pichai 和Google DeepMind CEO Demis Hassabis 對新模型進行了專門介紹。

谷歌Gemini1.5火速上線:MoE架構,100萬上下文

# This

Gemini 1.5 builds on Google’s leading research into Transformer and MoE architectures. The traditional Transformer acts as one large neural network, while the MoE model is divided into smaller "expert" neural networks.

Depending on the type of input given, the MoE model learns to selectively activate only the most relevant expert paths in its neural network. This specialization greatly increases the efficiency of the model. Google has been an early adopter and pioneer of deep learning MoE technology through research on sparse gated MoE, GShard-Transformer, Switch-Transformer, M4, and more.

Google’s latest innovations in model architecture enable Gemini 1.5 to learn complex tasks faster and maintain quality, while training and serving more efficiently. These efficiencies are helping Google teams iterate, train, and deliver more advanced versions of Gemini faster than ever before, and are working on further optimizations.

Longer context, more useful features

" of artificial intelligence models" "Context windows" are composed of tokens, which are the building blocks for processing information. A token can be an entire part or subpart of text, image, video, audio, or code. The larger the model's context window, the more information it can receive and process in a given prompt, making its output more consistent, relevant, and useful.

Through a series of machine learning innovations, Google has increased the context window capacity of 1.5 Pro well beyond the original 32,000 tokens of Gemini 1.0. The large model can now run in production with up to 1 million tokens.

This means the 1.5 Pro can handle large amounts of information at once, including 1 hour of video, 11 hours of audio, over 30,000 lines of code, or a code base of over 700,000 words . In Google's research, up to 10 million tokens were also successfully tested.

Complex reasoning about large amounts of information

1.5 Pro Can perform within a given prompt Seamlessly analyze, categorize and summarize large amounts of content. For example, when given a 402-page transcript of the Apollo 11 moon landing mission, it could reason about dialogue, events, and details throughout the document.
谷歌Gemini1.5火速上線:MoE架構,100萬上下文
# This Gemini 1.5 Pro can understand, reasoning and identifying the curiosity details in the 402 pages of Apollo 11th moon landing mission.

Better understanding and reasoning across modalities

1.5 Pro can perform highly complex understanding and reasoning tasks across different modalities, including video. For example, when given a 44-minute silent film by Buster Keaton, the model could accurately analyze various plot points and events, even reasoning about small details in the film that were easily overlooked.
Gemini 1.5 Pro can understand, reason about, and identify curious details in the 402 pages of records from the Apollo 11 moon landing mission.

Better understanding and reasoning across modalities

1.5 Pro Can perform highly complex understanding and reasoning tasks on different modalities including video. For example, when given a 44-minute silent film by Buster Keaton, the model could accurately analyze various plot points and events, even reasoning about small details in the film that were easily overlooked.
Gemini 1.5 Pro can understand, reason about, and identify curious details in the 402 pages of records from the Apollo 11 moon landing mission.

Better understanding and reasoning across modalities

1.5 Pro Can perform highly complex understanding and reasoning tasks on different modalities including video. For example, when given a 44-minute silent film by Buster Keaton, the model could accurately analyze various plot points and events, even reasoning about small details in the film that were easily overlooked.
谷歌Gemini1.5火速上線:MoE架構,100萬上下文The Gemini 1.5 Pro could identify 44 minutes of scenes from Buster Keaton's silent films when given simple line drawings as reference material for real-life objects.

Use longer code blocks to solve related problems

1.5 Pro can span longer Long code blocks perform more relevant problem-solving tasks. When given hints on more than 100,000 lines of code, it can better reason through examples, suggest useful modifications, and explain how different parts of the code work. 谷歌Gemini1.5火速上線:MoE架構,100萬上下文
                                                                                                                                                                            off ‐                                 toward 1.5 ’s s to 1.5's- to 1.5’s- to 1.5’s 1.5’s 1.5’s to 1.5G-1.5G to 1.5G, Gemini 1.5 Pro, and Gemini 1.5 #EnhancedPerformance

When tested on a comprehensive panel of text, code, image, audio, video evaluation, 1.5 Pro was used to develop large language models (LLM). ), 87% of the benchmarks performed better than 1.0 Pro. Compared to the 1.0 Ultra in the same benchmarks, it performs roughly similarly.

Gemini 1.5 Pro maintains a high level of performance even as the context window increases.

In the NIAH assessment, where a small piece of text containing a specific fact or statement was intentionally placed within a very long block of text, 1.5 Pro found the embedding 99% of the time The text of , there are only 1 million tokens in the data block.

Gemini 1.5 Pro also demonstrates impressive "in-context learning" skills, meaning it can learn from long prompts Learn new skills from information without the need for additional fine-tuning. Google tested this skill on the MTOB (Translation from One Book) benchmark, which shows the model's ability to learn from information it has never seen before. When given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model can learn to translate English into Kalamang at a level similar to a human learning the same content.

Since 1.5 Pro’s long context window is a first for a large model, Google is constantly developing new evaluations and benchmarks to test its novel features.

For more details, see the Gemini 1.5 Pro Technical Report.

Technical report address: https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf

Build and experiment with Gemini models

Google is committed to responsibly bringing each new generation of Gemini models to billions of people around the world Used by people, developers and enterprise users.

Starting today, Google is making 1.5 Pro preview available to developers and enterprise customers through AI Studio and Vertex AI.

In the future, when the model goes to wider release, Google will launch 1.5 Pro with a standard 128,000 token context window. Soon, Google plans to introduce pricing tiers starting with the standard 128,000 context windows and scaling up to 1 million tokens as it improves the model.

Early testers can try 1 million token context windows for free during testing, and significant speed improvements are coming.

Developers interested in testing 1.5 Pro can register now in AI Studio, while enterprise customers can contact their Vertex AI account team.

Reference link: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ #sundar-note

以上是谷歌Gemini1.5火速上線:MoE架構,100萬上下文的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述
本文轉載於:机器之心。如有侵權,請聯絡admin@php.cn刪除
烹飪創新:人工智能如何改變食品服務烹飪創新:人工智能如何改變食品服務Apr 12, 2025 pm 12:09 PM

AI增強食物準備 在新生的使用中,AI系統越來越多地用於食品製備中。 AI驅動的機器人在廚房中用於自動化食物準備任務,例如翻轉漢堡,製作披薩或組裝SA

Python名稱空間和可變範圍的綜合指南Python名稱空間和可變範圍的綜合指南Apr 12, 2025 pm 12:00 PM

介紹 了解Python函數中變量的名稱空間,範圍和行為對於有效編寫和避免運行時錯誤或異常至關重要。在本文中,我們將研究各種ASP

視覺語言模型(VLMS)的綜合指南視覺語言模型(VLMS)的綜合指南Apr 12, 2025 am 11:58 AM

介紹 想像一下,穿過​​美術館,周圍是生動的繪畫和雕塑。現在,如果您可以向每一部分提出一個問題並獲得有意義的答案,該怎麼辦?您可能會問:“您在講什麼故事?

聯發科技與kompanio Ultra和Dimenty 9400增強優質陣容聯發科技與kompanio Ultra和Dimenty 9400增強優質陣容Apr 12, 2025 am 11:52 AM

繼續使用產品節奏,本月,Mediatek發表了一系列公告,包括新的Kompanio Ultra和Dimenty 9400。這些產品填補了Mediatek業務中更傳統的部分,其中包括智能手機的芯片

本週在AI:沃爾瑪在時尚趨勢之前設定了時尚趨勢本週在AI:沃爾瑪在時尚趨勢之前設定了時尚趨勢Apr 12, 2025 am 11:51 AM

#1 Google推出了Agent2Agent 故事:現在是星期一早上。作為AI驅動的招聘人員,您更聰明,而不是更努力。您在手機上登錄公司的儀表板。它告訴您三個關鍵角色已被採購,審查和計劃的FO

生成的AI遇到心理摩托車生成的AI遇到心理摩托車Apr 12, 2025 am 11:50 AM

我猜你一定是。 我們似乎都知道,心理障礙由各種chat不休,這些chat不休,這些chat不休,混合了各種心理術語,並且常常是難以理解的或完全荒謬的。您需要做的一切才能噴出fo

原型:科學家將紙變成塑料原型:科學家將紙變成塑料Apr 12, 2025 am 11:49 AM

根據本週發表的一項新研究,只有在2022年製造的塑料中,只有9.5%的塑料是由回收材料製成的。同時,塑料在垃圾填埋場和生態系統中繼續堆積。 但是有幫助。一支恩金團隊

AI分析師的崛起:為什麼這可能是AI革命中最重要的工作AI分析師的崛起:為什麼這可能是AI革命中最重要的工作Apr 12, 2025 am 11:41 AM

我最近與領先的企業分析平台Alteryx首席執行官安迪·麥克米倫(Andy Macmillan)的對話強調了這一在AI革命中的關鍵但不足的作用。正如Macmillan所解釋的那樣,原始業務數據與AI-Ready Informat之間的差距

See all articles

熱AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool

Undress AI Tool

免費脫衣圖片

Clothoff.io

Clothoff.io

AI脫衣器

AI Hentai Generator

AI Hentai Generator

免費產生 AI 無盡。

熱門文章

R.E.P.O.能量晶體解釋及其做什麼(黃色晶體)
3 週前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.最佳圖形設置
3 週前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.如果您聽不到任何人,如何修復音頻
3 週前By尊渡假赌尊渡假赌尊渡假赌
WWE 2K25:如何解鎖Myrise中的所有內容
4 週前By尊渡假赌尊渡假赌尊渡假赌

熱工具

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

這個專案正在遷移到osdn.net/projects/mingw的過程中,你可以繼續在那裡關注我們。 MinGW:GNU編譯器集合(GCC)的本機Windows移植版本,可自由分發的導入函式庫和用於建置本機Windows應用程式的頭檔;包括對MSVC執行時間的擴展,以支援C99功能。 MinGW的所有軟體都可以在64位元Windows平台上運作。

WebStorm Mac版

WebStorm Mac版

好用的JavaScript開發工具

SecLists

SecLists

SecLists是最終安全測試人員的伙伴。它是一個包含各種類型清單的集合,這些清單在安全評估過程中經常使用,而且都在一個地方。 SecLists透過方便地提供安全測試人員可能需要的所有列表,幫助提高安全測試的效率和生產力。清單類型包括使用者名稱、密碼、URL、模糊測試有效載荷、敏感資料模式、Web shell等等。測試人員只需將此儲存庫拉到新的測試機上,他就可以存取所需的每種類型的清單。

Dreamweaver Mac版

Dreamweaver Mac版

視覺化網頁開發工具

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser是一個安全的瀏覽器環境,安全地進行線上考試。該軟體將任何電腦變成一個安全的工作站。它控制對任何實用工具的訪問,並防止學生使用未經授權的資源。