介紹
這是編寫 Go 應用程式的第二部分,該應用程式用於根據所選文字確定使用者發送給 LLM 的令牌數量。
在上一篇文章中,我提到我只想建立一些僅用 Golang 寫的東西,在我查看的 Github 儲存庫中,這個似乎非常好:go-hggingface。該代碼似乎很新,但它“有點”適合我。
執行
首先,程式碼存取 Hugginface 以取得所有與 LLM 相關的「標記器」列表,因此使用者應該擁有 HF 標記。因此,我將令牌放入 .env 檔案中,如圖所示。
HF_TOKEN="your-huggingface-token"
然後使用下頁中提供的範例 (https://github.com/gomlx/go-huggingface?tab=readme-ov-file),我圍繞它建立了自己的程式碼。
package main import ( "bytes" "fmt" "log" "os" "os/exec" "runtime" "github.com/gomlx/go-huggingface/hub" "github.com/gomlx/go-huggingface/tokenizers" "github.com/joho/godotenv" "github.com/sqweek/dialog" "fyne.io/fyne/v2" "fyne.io/fyne/v2/app" "fyne.io/fyne/v2/container" "fyne.io/fyne/v2/widget" //"github.com/inancgumus/scree" ) var ( // Model IDs we use for testing. hfModelIDs = []string{ "ibm-granite/granite-3.1-8b-instruct", "meta-llama/Llama-3.3-70B-Instruct", "mistralai/Mistral-7B-Instruct-v0.3", "google/gemma-2-2b-it", "sentence-transformers/all-MiniLM-L6-v2", "protectai/deberta-v3-base-zeroshot-v1-onnx", "KnightsAnalytics/distilbert-base-uncased-finetuned-sst-2-english", "KnightsAnalytics/distilbert-NER", "SamLowe/roberta-base-go_emotions-onnx", } ) func runCmd(name string, arg ...string) { cmd := exec.Command(name, arg...) cmd.Stdout = os.Stdout cmd.Run() } func ClearTerminal() { switch runtime.GOOS { case "darwin": runCmd("clear") case "linux": runCmd("clear") case "windows": runCmd("cmd", "/c", "cls") default: runCmd("clear") } } func FileSelectionDialog() string { // Open a file dialog box and let the user select a text file filePath, err := dialog.File().Filter("Text Files", "txt").Load() if err != nil { if err.Error() == "Cancelled" { fmt.Println("File selection was cancelled.") } log.Fatalf("Error selecting file: %v", err) } // Output the selected file name fmt.Printf("Selected file: %s\n", filePath) return filePath } func main() { var filePath string // read the '.env' file err := godotenv.Load() if err != nil { log.Fatal("Error loading .env file") } // get the value of the 'HF_TOKEN' environment variable hfAuthToken := os.Getenv("HF_TOKEN") if hfAuthToken == "" { log.Fatal("HF_TOKEN environment variable is not set") } // to display a list of LLMs to determine the # of tokens later on regarding the given text var llm string = "" var modelID string = "" myApp := app.New() myWindow := myApp.NewWindow("Select a LLM in the list") items := hfModelIDs // Label to display the selected item selectedItem := widget.NewLabel("Selected LLM: None") // Create a list widget list := widget.NewList( func() int { // Return the number of items in the list return len(items) }, func() fyne.CanvasObject { // Template for each list item return widget.NewLabel("Template") }, func(id widget.ListItemID, obj fyne.CanvasObject) { // Update the template with the actual data obj.(*widget.Label).SetText(items[id]) }, ) // Handle list item selection list.OnSelected = func(id widget.ListItemID) { selectedItem.SetText("Selected LLM:" + items[id]) llm = items[id] } // Layout with the list and selected item label content := container.NewVBox( list, selectedItem, ) // Set the content of the window myWindow.SetContent(content) myWindow.Resize(fyne.NewSize(300, 400)) myWindow.ShowAndRun() ClearTerminal() fmt.Printf("Selected LLM: %s\n", llm) ////// //List files for the selected model for _, modelID := range hfModelIDs { if modelID == llm { fmt.Printf("\n%s:\n", modelID) repo := hub.New(modelID).WithAuth(hfAuthToken) for fileName, err := range repo.IterFileNames() { if err != nil { panic(err) } fmt.Printf("fileName\t%s\n", fileName) fmt.Printf("repo\t%s\n", repo) fmt.Printf("modelID\t%s\n", modelID) } } } //List tokenizer classes for the selected model for _, modelID := range hfModelIDs { if modelID == llm { fmt.Printf("\n%s:\n", modelID) repo := hub.New(modelID).WithAuth(hfAuthToken) fmt.Printf("\trepo=%s\n", repo) config, err := tokenizers.GetConfig(repo) if err != nil { panic(err) } fmt.Printf("\ttokenizer_class=%s\n", config.TokenizerClass) } } // Models URL -> "https://huggingface.co/api/models" repo := hub.New(modelID).WithAuth(hfAuthToken) tokenizer, err := tokenizers.New(repo) if err != nil { panic(err) } // call file selection dialogbox filePath = FileSelectionDialog() // Open the file filerc, err := os.Open(filePath) if err != nil { fmt.Printf("Error opening file: %v\n", err) return } defer filerc.Close() // Put the text file content into a buffer and convert it to a string. buf := new(bytes.Buffer) buf.ReadFrom(filerc) sentence := buf.String() tokens := tokenizer.Encode(sentence) fmt.Println("Sentence:\n", sentence) fmt.Printf("Tokens: \t%v\n", tokens) }
在「hfModelIDs」的「var」部分中,我加入了一些新的引用,例如IBM 的Granite、Meta 的LLama 以及Mistral模型。
Huggingface 令牌也直接在 Go 程式碼中取得和讀取。
我添加了一個對話框來顯示法學碩士列表(我最終會更改),一個對話框來添加文件中的文本(我喜歡這種東西?)以及一些要清除和刪除的代碼行清潔屏幕? !
輸入文字如下;
The popularity of the Rust language continues to explode; yet, many critical codebases remain authored in C, and cannot be realistically rewritten by hand. Automatically translating C to Rust is thus an appealing course of action. Several works have gone down this path, handling an ever-increasing subset of C through a variety of Rust features, such as unsafe. While the prospect of automation is appealing, producing code that relies on unsafe negates the memory safety guarantees offered by Rust, and therefore the main advantages of porting existing codebases to memory-safe languages. We instead explore a different path, and explore what it would take to translate C to safe Rust; that is, to produce code that is trivially memory safe, because it abides by Rust's type system without caveats. Our work sports several original contributions: a type-directed translation from (a subset of) C to safe Rust; a novel static analysis based on "split trees" that allows expressing C's pointer arithmetic using Rust's slices and splitting operations; an analysis that infers exactly which borrows need to be mutable; and a compilation strategy for C's struct types that is compatible with Rust's distinction between non-owned and owned allocations. We apply our methodology to existing formally verified C codebases: the HACL* cryptographic library, and binary parsers and serializers from EverParse, and show that the subset of C we support is sufficient to translate both applications to safe Rust. Our evaluation shows that for the few places that do violate Rust's aliasing discipline, automated, surgical rewrites suffice; and that the few strategic copies we insert have a negligible performance impact. Of particular note, the application of our approach to HACL* results in a 80,000 line verified cryptographic library, written in pure Rust, that implements all modern algorithms - the first of its kind.
檢定
執行後的程式碼會顯示對話方塊 bx,您可以在其中選擇所需的 LLM。
如果一切順利,下一步是在本機下載「tokenizer」檔案(請參閱 Github 儲存庫的說明),然後會顯示一個對話框,選擇包含要評估的內容的文字檔案令牌數量。
到目前為止,我已要求訪問 Meta LLama 和 Google“google/gemma-2–2b-it”模型,並正在等待訪問權限被授予。
google/gemma-2-2b-it: repo=google/gemma-2-2b-it panic: request for metadata from "https://huggingface.co/google/gemma-2-2b-it/resolve/299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8/tokenizer_config.json" failed with the following message: "403 Forbidden"
結論
我認為實現我想要的目標的正確途徑是,一個能夠確定代幣數量的 Golang 程式是用戶發送到 LLM 的查詢。
這個專案的唯一目的是了解針對各種 LLM 的查詢中確定令牌數量背後的內部系統,並發現它們是如何計算的。
感謝您的閱讀並歡迎評論。
最終結論之前,敬請期待…?
以上是計算 Go 中發送給 LLM 的 Token 數量(第 2 部分)的詳細內容。更多資訊請關注PHP中文網其他相關文章!

goimpactsdevelopmentpositationality throughspeed,效率和模擬性。 1)速度:gocompilesquicklyandrunseff,IdealforlargeProjects.2)效率:效率:ITScomprehenSevestAndardArdardArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdArdEcceSteral Depentencies,增強的Depleflovelmentimency.3)簡單性。

C 更適合需要直接控制硬件資源和高性能優化的場景,而Golang更適合需要快速開發和高並發處理的場景。 1.C 的優勢在於其接近硬件的特性和高度的優化能力,適合遊戲開發等高性能需求。 2.Golang的優勢在於其簡潔的語法和天然的並發支持,適合高並發服務開發。

Golang在实际应用中表现出色,以简洁、高效和并发性著称。1)通过Goroutines和Channels实现并发编程,2)利用接口和多态编写灵活代码,3)使用net/http包简化网络编程,4)构建高效并发爬虫,5)通过工具和最佳实践进行调试和优化。

Go語言的核心特性包括垃圾回收、靜態鏈接和並發支持。 1.Go語言的並發模型通過goroutine和channel實現高效並發編程。 2.接口和多態性通過實現接口方法,使得不同類型可以統一處理。 3.基本用法展示了函數定義和調用的高效性。 4.高級用法中,切片提供了動態調整大小的強大功能。 5.常見錯誤如競態條件可以通過gotest-race檢測並解決。 6.性能優化通過sync.Pool重用對象,減少垃圾回收壓力。

Go語言在構建高效且可擴展的系統中表現出色,其優勢包括:1.高性能:編譯成機器碼,運行速度快;2.並發編程:通過goroutines和channels簡化多任務處理;3.簡潔性:語法簡潔,降低學習和維護成本;4.跨平台:支持跨平台編譯,方便部署。

關於SQL查詢結果排序的疑惑學習SQL的過程中,常常會遇到一些令人困惑的問題。最近,筆者在閱讀《MICK-SQL基礎�...

golang ...


熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

AI Hentai Generator
免費產生 AI 無盡。

熱門文章

熱工具

SublimeText3 Linux新版
SublimeText3 Linux最新版

SublimeText3 Mac版
神級程式碼編輯軟體(SublimeText3)

ZendStudio 13.5.1 Mac
強大的PHP整合開發環境

SAP NetWeaver Server Adapter for Eclipse
將Eclipse與SAP NetWeaver應用伺服器整合。

EditPlus 中文破解版
體積小,語法高亮,不支援程式碼提示功能