搜尋
首頁科技週邊人工智慧大佬出走後首個發布! Stability官宣程式碼模型Stable Code Instruct 3B

大佬出走後首個發布! Stability官宣程式碼模型Stable Code Instruct 3B

Mar 29, 2024 pm 10:16 PM
精確度模型程式碼排列overflow

大佬出走後,第一個模型來了!

就在今天,Stability AI官宣了新的程式碼模型Stable Code Instruct 3B。

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3B圖片

Stability是非常重要的,執行長離職對Stable Diffusion造成了一些困擾,投資公司出了點故障,自己的薪水也可能有問題了。

然而,樓外風雨飄搖,實驗室裡巋然不動,研究該做做,討論該發發,模型該調調,大模型各領域的戰爭是一個沒落下。

不只是鋪開攤子搞全面戰爭,每項研究也都在不斷前進,例如今天的Stable Code Instruct 3B就是在之前的Stable Code 3B的基礎上做了指令調優。

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3B圖片

論文地址:https://static1.squarespace.com/static/6213c340453c3f502425776e/t/6601c5713150412ed /Stable_Code_TechReport_release.pdf

透過自然語言提示,Stable Code Instruct 3B可以處理各種任務,例如程式碼產生、數學和其他與軟體開發相關的查詢。

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3B圖片

同階無敵,越級強殺

Stable Code Instruct 3B在同等參數量的模型中,做到了目前的SOTA,甚至優於比自己大兩倍多的CodeLlama 7B Instruct等模型,並且在軟體工程相關任務中的表現與StarChat 15B相當。

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3B圖片

從上圖可以看出,與Codellama 7B Instruct和DeepSeek-Coder Instruct 1.3B等領先模型相比,Stable Code Instruct 3B在一系列程式設計任務中表現優異。

測試表明,Stable Code Instruct 3B在程式碼完成準確性、對自然語言指令的理解、以及跨不同程式語言的多功能性方面,都能夠打平甚至超越競爭對手。

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3B圖片

Stable Code Instruct 3B根據Stack Overflow 2023開發者調查的結果,將訓練專注於Python、Javascript、 Java、C、C 和Go等程式語言。

上圖使用Multi-PL基準測試,比較了三個模型以各種程式語言產生輸出的強度。可以發現Stable Code Instruct 3B在所有語言中都明顯優於CodeLlama,且參數量還少了一半以上。

除了上述的熱門程式語言,Stable Code Instruct 3B還包括對其他語言(如SQL、PHP和Rust)的訓練,並且即使在沒有經過訓練的語言(如Lua)中,也能提供強大的測試性能。

Stable Code Instruct 3B不僅精通程式碼生成,還精通FIM(程式碼中間填充)任務、資料庫查詢、程式碼翻譯、解釋和創建。

透過指令調優,模型能夠理解細微的指令並採取行動,促進了除了簡單程式碼完成之外的廣泛編碼任務,例如數學理解、邏輯推理和處理軟體開發的複雜技術。

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3B圖片

模式下載:https://huggingface.co/stabilityai/stable-code-instruct-3b

Stable Code Instruct 3B現在可以透過Stability AI會員資格,用於商業目的。對於非商業用途,可以在Hugging Face上下載模型重量和程式碼。

Technical details

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPictures

Model architecture

Stable Code is built on Stable LM 3B and is a decoder-only Transformer structure with a design similar to LLaMA. The following table is some key structural information:

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPicture

##The main differences from LLaMA include:

Positional embedding: Use rotated positional embedding in the first 25% of the header embedding to improve subsequent throughput.

Regularization: Use LayerNorm with learning bias term instead of RMSNorm.

Bias terms: All bias terms in the feedforward network and multi-head self-attention layer are deleted, except for KQV.

Uses the same tokenizer (BPE) as the Stable LM 3B model, with a size of 50,257; in addition, special markers of StarCoder are also referenced, including indicating file name, storage Library stars, fill-in-the-middle (FIM), etc.

For long context training, special markers are used to indicate when two concatenated files belong to the same repository.

Training process

Training data

Pre-training data set A variety of publicly accessible large-scale data sources are collected, including code repositories, technical documentation (such as readthedocs), mathematics-focused texts, and extensive web datasets.

The main goal of the initial pre-training phase is to learn rich internal representations to significantly improve the model's ability in mathematical understanding, logical reasoning, and processing complex technical texts related to software development.

Additionally, the training data includes a general text dataset to provide the model with broader language knowledge and context, ultimately enabling the model to handle a wider range of queries and tasks in a conversational manner.

The following table shows the data sources, categories and sampling weights of the pre-training corpus, where the ratio of code and natural language data is 80:20.

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPicture

In addition, the researchers also introduced a small synthetic dataset, the data was synthesized from the seed prompts of CodeAlpaca dataset, Contains 174,000 tips.

And followed the WizardLM method, gradually increasing the complexity of the given seed prompts, and obtained an additional 100,000 prompts.

The authors believe that introducing this synthetic data early in the pre-training stage helps the model respond better to natural language text.

Long context dataset

Since multiple files in a repository often depend on each other, the context length is important for encoding Models are important.

The researchers estimated the median and average number of tokens in the software repository to be 12k and 18k respectively, so 16,384 was chosen as the context length.

The next step was to create a long context dataset. The researchers took some files written in popular languages ​​in the repository and combined them together, inserting between each file. A special tag to maintain separation while preserving content flow.

To circumvent any potential bias that might arise from the fixed order of the files, the authors employed a randomization strategy. For each repository, two different sequences of connection files are generated.

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPicture

Phase-based training

Stable Code uses 32 Amazon P4d instances for training, containing 256 NVIDIA A100 (40GB HBM2) GPUs, and uses ZeRO for distributed optimization.

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPicture

A phased training method is used here, as shown in the picture above.

Training follows standard autoregressive sequence modeling to predict the next token. The model is initialized using the checkpoint of Stable LM 3B. The context length of the first stage of training is 4096, and then continuous pre-training is performed.

Training is performed with BFloat16 mixed precision, and FP32 is used for all-reduce. AdamW optimizer settings are: β1=0.9, β2=0.95, ε=1e−6, λ (weight decay)=0.1. Start with learning rate = 3.2e-4, set the minimum learning rate to 3.2e-5, and use cosine decay.

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPicture

One of the core assumptions of natural language model training is the causal order from left to right, but for code Say, this assumption does not always hold (e.g., function calls and function declarations can be in any order for many functions).

To solve this problem, researchers used FIM (fill-in-the-middle). Randomly split the document into three segments: prefix, middle, and suffix, then move the middle segment to the end of the document. After rearrangement, the same autoregressive training process is followed.

Instruction fine-tuning

After pre-training, the author further improves the model’s dialogue skills through a fine-tuning stage, which includes supervised fine-tuning (SFT) and Direct Preference Optimization (DPO).

First perform SFT fine-tuning using publicly available datasets on Hugging Face: including OpenHermes, Code Feedback, CodeAlpaca.

After performing exact match deduplication, the three datasets provide a total of approximately 500,000 training samples.

Use the cosine learning rate scheduler to control the training process and set the global batch size to 512 to pack the input into sequences of length no longer than 4096.

After SFT, the DPO phase begins, using data from UltraFeedback to curate a dataset containing approximately 7,000 samples. In addition, in order to improve the security of the model, the author also included the Helpful and Harmless RLFH dataset.

The researchers adopted RMSProp as the optimization algorithm and increased the learning rate to a peak of 5e-7 in the initial stage of DPO training.

Performance Test

The following compares the performance of the model on the code completion task, using the Multi-PL benchmark to evaluate the model.

Stable Code Base

The following table shows the size of 3B parameters and below on Multi-PL Performance of different code models.

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPicture

Although the parameters of Stable Code are less than 40% and 20% of those of Code Llama and StarCoder 15B, respectively, The model's average performance across programming languages ​​is on par with them.

Stable Code Instruct

The following table evaluates the instructs of several models in the Multi-PL benchmark test Fine-tuned version.

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPicture

SQL Performance

Code language An important application of the model is database query tasks. In this area, the performance of Stable Code Instruct is compared with other popular instruction-tuned models, and models trained specifically for SQL. Benchmarks created here using Defog AI.

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPicture

Inference performance

Table below The throughput and power consumption when running Stable Code on consumer-grade devices and corresponding system environments are given.

大佬出走后首个发布!Stability官宣代码模型Stable Code Instruct 3BPicture

The results show that when using lower precision, the throughput increases by nearly two times. However, it is important to note that implementing lower precision quantization may result in some (potentially large) degradation in model performance.

Reference:https://www.php.cn/link/8cb3522da182ff9ea5925bbd8975b203

以上是大佬出走後首個發布! Stability官宣程式碼模型Stable Code Instruct 3B的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述
本文轉載於:51CTO.COM。如有侵權,請聯絡admin@php.cn刪除
微軟工作趨勢指數2025顯示工作場所容量應變微軟工作趨勢指數2025顯示工作場所容量應變Apr 24, 2025 am 11:19 AM

由於AI的快速整合而加劇了工作場所的迅速危機危機,要求戰略轉變以外的增量調整。 WTI的調查結果強調了這一點:68%的員工在工作量上掙扎,導致BUR

AI可以理解嗎?中國房間的論點說不,但是對嗎?AI可以理解嗎?中國房間的論點說不,但是對嗎?Apr 24, 2025 am 11:18 AM

約翰·塞爾(John Searle)的中國房間論點:對AI理解的挑戰 Searle的思想實驗直接質疑人工智能是否可以真正理解語言或具有真正意識。 想像一個人,對下巴一無所知

中國的'智能” AI助手回應微軟召回的隱私缺陷中國的'智能” AI助手回應微軟召回的隱私缺陷Apr 24, 2025 am 11:17 AM

與西方同行相比,中國的科技巨頭在AI開發方面的課程不同。 他們不專注於技術基準和API集成,而是優先考慮“屏幕感知” AI助手 - AI T

Docker將熟悉的容器工作流程帶到AI型號和MCP工具Docker將熟悉的容器工作流程帶到AI型號和MCP工具Apr 24, 2025 am 11:16 AM

MCP:賦能AI系統訪問外部工具 模型上下文協議(MCP)讓AI應用能夠通過標準化接口與外部工具和數據源交互。由Anthropic開發並得到主要AI提供商的支持,MCP允許語言模型和智能體發現可用工具並使用合適的參數調用它們。然而,實施MCP服務器存在一些挑戰,包括環境衝突、安全漏洞以及跨平台行為不一致。 Forbes文章《Anthropic的模型上下文協議是AI智能體發展的一大步》作者:Janakiram MSVDocker通過容器化解決了這些問題。基於Docker Hub基礎設施構建的Doc

使用6種AI街頭智能策略來建立一家十億美元的創業使用6種AI街頭智能策略來建立一家十億美元的創業Apr 24, 2025 am 11:15 AM

有遠見的企業家採用的六種策略,他們利用尖端技術和精明的商業敏銳度來創造高利潤的可擴展公司,同時保持控制。本指南是針對有抱負的企業家的,旨在建立一個

Google照片更新解鎖了您所有圖片的驚人Ultra HDRGoogle照片更新解鎖了您所有圖片的驚人Ultra HDRApr 24, 2025 am 11:14 AM

Google Photos的新型Ultra HDR工具:改變圖像增強的遊戲規則 Google Photos推出了一個功能強大的Ultra HDR轉換工具,將標準照片轉換為充滿活力的高動態範圍圖像。這種增強功能受益於攝影師

Descope建立AI代理集成的身份驗證框架Descope建立AI代理集成的身份驗證框架Apr 24, 2025 am 11:13 AM

技術架構解決了新興的身份驗證挑戰 代理身份集線器解決了許多組織僅在開始AI代理實施後發現的問題,即傳統身份驗證方法不是為機器設計的

Google Cloud Next 2025以及現代工作的未來Google Cloud Next 2025以及現代工作的未來Apr 24, 2025 am 11:12 AM

(注意:Google是我公司的諮詢客戶,Moor Insights&Strateging。) AI:從實驗到企業基金會 Google Cloud Next 2025展示了AI從實驗功能到企業技術的核心組成部分的演變,

See all articles

熱AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool

Undress AI Tool

免費脫衣圖片

Clothoff.io

Clothoff.io

AI脫衣器

Video Face Swap

Video Face Swap

使用我們完全免費的人工智慧換臉工具,輕鬆在任何影片中換臉!

熱工具

EditPlus 中文破解版

EditPlus 中文破解版

體積小,語法高亮,不支援程式碼提示功能

記事本++7.3.1

記事本++7.3.1

好用且免費的程式碼編輯器

SublimeText3漢化版

SublimeText3漢化版

中文版,非常好用

Dreamweaver Mac版

Dreamweaver Mac版

視覺化網頁開發工具

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

這個專案正在遷移到osdn.net/projects/mingw的過程中,你可以繼續在那裡關注我們。 MinGW:GNU編譯器集合(GCC)的本機Windows移植版本,可自由分發的導入函式庫和用於建置本機Windows應用程式的頭檔;包括對MSVC執行時間的擴展,以支援C99功能。 MinGW的所有軟體都可以在64位元Windows平台上運作。