search
HomeBackend DevelopmentGolangCounting the number of Tokens sent to a LLM in Go (part 2)

Counting the number of Tokens sent to a LLM in Go (part 2)

Introduction

This is the second part of the work on writing a Go application to determine the number of tokens that a user sends to a LLM based on a chosen text.

In the previous article I mentioned I want to build something written in Golang only, and among the Github repositories I looked at, this one seems to be real good: go-hggingface. The code seems to be very new but it’s “kind-of” working for me.

Implementation

First of, the code accesses Hugginface to get the list of all the “tokenizers” which go with a LLM, so the user should have a HF token. So I put my token in an .env file as shown.

HF_TOKEN="your-huggingface-token"

Then using the example provided in the following page (https://github.com/gomlx/go-huggingface?tab=readme-ov-file) I built my own code around it.

package main

import (
 "bytes"
 "fmt"
 "log"
 "os"
 "os/exec"
 "runtime"

 "github.com/gomlx/go-huggingface/hub"
 "github.com/gomlx/go-huggingface/tokenizers"

 "github.com/joho/godotenv"
 "github.com/sqweek/dialog"

 "fyne.io/fyne/v2"
 "fyne.io/fyne/v2/app"
 "fyne.io/fyne/v2/container"
 "fyne.io/fyne/v2/widget"
 //"github.com/inancgumus/scree"
)

var (
 // Model IDs we use for testing.
 hfModelIDs = []string{
  "ibm-granite/granite-3.1-8b-instruct",
  "meta-llama/Llama-3.3-70B-Instruct",
  "mistralai/Mistral-7B-Instruct-v0.3",
  "google/gemma-2-2b-it",
  "sentence-transformers/all-MiniLM-L6-v2",
  "protectai/deberta-v3-base-zeroshot-v1-onnx",
  "KnightsAnalytics/distilbert-base-uncased-finetuned-sst-2-english",
  "KnightsAnalytics/distilbert-NER",
  "SamLowe/roberta-base-go_emotions-onnx",
 }
)

func runCmd(name string, arg ...string) {
 cmd := exec.Command(name, arg...)
 cmd.Stdout = os.Stdout
 cmd.Run()
}

func ClearTerminal() {
 switch runtime.GOOS {
 case "darwin":
  runCmd("clear")
 case "linux":
  runCmd("clear")
 case "windows":
  runCmd("cmd", "/c", "cls")
 default:
  runCmd("clear")
 }
}

func FileSelectionDialog() string {
 // Open a file dialog box and let the user select a text file
 filePath, err := dialog.File().Filter("Text Files", "txt").Load()
 if err != nil {
  if err.Error() == "Cancelled" {
   fmt.Println("File selection was cancelled.")
  }
  log.Fatalf("Error selecting file: %v", err)
 }

 // Output the selected file name
 fmt.Printf("Selected file: %s\n", filePath)
 return filePath
}

func main() {

 var filePath string

 // read the '.env' file
 err := godotenv.Load()
 if err != nil {
  log.Fatal("Error loading .env file")
 }
 // get the value of the 'HF_TOKEN' environment variable
 hfAuthToken := os.Getenv("HF_TOKEN")
 if hfAuthToken == "" {
  log.Fatal("HF_TOKEN environment variable is not set")
 }

 // to display a list of LLMs to determine the # of tokens later on regarding the given text
 var llm string = ""
 var modelID string = ""
 myApp := app.New()
 myWindow := myApp.NewWindow("Select a LLM in the list")
 items := hfModelIDs
 // Label to display the selected item
 selectedItem := widget.NewLabel("Selected LLM: None")
 // Create a list widget
 list := widget.NewList(
  func() int {
   // Return the number of items in the list
   return len(items)
  },
  func() fyne.CanvasObject {
   // Template for each list item
   return widget.NewLabel("Template")
  },
  func(id widget.ListItemID, obj fyne.CanvasObject) {
   // Update the template with the actual data
   obj.(*widget.Label).SetText(items[id])
  },
 )
 // Handle list item selection
 list.OnSelected = func(id widget.ListItemID) {
  selectedItem.SetText("Selected LLM:" + items[id])
  llm = items[id]
 }

 // Layout with the list and selected item label
 content := container.NewVBox(
  list,
  selectedItem,
 )

 // Set the content of the window
 myWindow.SetContent(content)
 myWindow.Resize(fyne.NewSize(300, 400))
 myWindow.ShowAndRun()
 ClearTerminal()
 fmt.Printf("Selected LLM: %s\n", llm)
 //////

 //List files for the selected model
 for _, modelID := range hfModelIDs {
  if modelID == llm {
   fmt.Printf("\n%s:\n", modelID)
   repo := hub.New(modelID).WithAuth(hfAuthToken)
   for fileName, err := range repo.IterFileNames() {
    if err != nil {
     panic(err)
    }
    fmt.Printf("fileName\t%s\n", fileName)
    fmt.Printf("repo\t%s\n", repo)
    fmt.Printf("modelID\t%s\n", modelID)
   }
  }
 }

 //List tokenizer classes for the selected model
 for _, modelID := range hfModelIDs {
  if modelID == llm {
   fmt.Printf("\n%s:\n", modelID)
   repo := hub.New(modelID).WithAuth(hfAuthToken)
   fmt.Printf("\trepo=%s\n", repo)
   config, err := tokenizers.GetConfig(repo)
   if err != nil {
    panic(err)
   }
   fmt.Printf("\ttokenizer_class=%s\n", config.TokenizerClass)
  }
 }

 // Models URL -> "https://huggingface.co/api/models"
 repo := hub.New(modelID).WithAuth(hfAuthToken)
 tokenizer, err := tokenizers.New(repo)
 if err != nil {
  panic(err)
 }

 // call file selection dialogbox
 filePath = FileSelectionDialog()

 // Open the file
 filerc, err := os.Open(filePath)
 if err != nil {
  fmt.Printf("Error opening file: %v\n", err)
  return
 }
 defer filerc.Close()

 // Put the text file content into a buffer and convert it to a string.
 buf := new(bytes.Buffer)
 buf.ReadFrom(filerc)
 sentence := buf.String()

 tokens := tokenizer.Encode(sentence)
 fmt.Println("Sentence:\n", sentence)

 fmt.Printf("Tokens:  \t%v\n", tokens)
}

In the “var” section for “hfModelIDs” I added some new references such as IBM’s Granite, Meta’s LLama and also a Mistral model.

The Huggingface token is directly sourced and read inside the Go code as well.

I added a dialog box to display the LLMs’ list (which I’ll change eventually), a dialog box to add the text from a file (I love that kind of stuff ?) and some lines of code to clear and clean the screen ?!

The input text is the following;

The popularity of the Rust language continues to explode; yet, many critical codebases remain authored in C, and cannot be realistically rewritten by hand. Automatically translating C to Rust is thus an appealing course of action. Several works have gone down this path, handling an ever-increasing subset of C through a variety of Rust features, such as unsafe. While the prospect of automation is appealing, producing code that relies on unsafe negates the memory safety guarantees offered by Rust, and therefore the main advantages of porting existing codebases to memory-safe languages.
We instead explore a different path, and explore what it would take to translate C to safe Rust; that is, to produce code that is trivially memory safe, because it abides by Rust's type system without caveats. Our work sports several original contributions: a type-directed translation from (a subset of) C to safe Rust; a novel static analysis based on "split trees" that allows expressing C's pointer arithmetic using Rust's slices and splitting operations; an analysis that infers exactly which borrows need to be mutable; and a compilation strategy for C's struct types that is compatible with Rust's distinction between non-owned and owned allocations.
We apply our methodology to existing formally verified C codebases: the HACL* cryptographic library, and binary parsers and serializers from EverParse, and show that the subset of C we support is sufficient to translate both applications to safe Rust. Our evaluation shows that for the few places that do violate Rust's aliasing discipline, automated, surgical rewrites suffice; and that the few strategic copies we insert have a negligible performance impact. Of particular note, the application of our approach to HACL* results in a 80,000 line verified cryptographic library, written in pure Rust, that implements all modern algorithms - the first of its kind.

Test
The code once executed shows a dialog bx where you can select a desired LLM.

Counting the number of Tokens sent to a LLM in Go (part 2)

If everything goes fine, the next step is to download the “tokenizer” file locally (refer to the Github repo’s explanations) and then a dialog box is shown to chose the text file with the content which is to be evaluated in terms of the number of tokens.

So far, I have asked access to Meta LLama and Google “google/gemma-2–2b-it” models and am awaiting access to be granted.

google/gemma-2-2b-it:
        repo=google/gemma-2-2b-it
panic: request for metadata from "https://huggingface.co/google/gemma-2-2b-it/resolve/299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8/tokenizer_config.json" failed with the following message: "403 Forbidden"

Counting the number of Tokens sent to a LLM in Go (part 2)

Conclusion

I think being on the right path to achieve what I intended to get, a Golang programme which is able to determine the number of tokens is a user’s query sent to a LLM.

The only aim of this project is to learn the internal system behind determination of the number of tokens in queries against a variety of LLMs and to discover how they are calculated.

Thanks for reading and open to comments.

And till the final conclusion, stay tuned… ?

The above is the detailed content of Counting the number of Tokens sent to a LLM in Go (part 2). For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Golang's Impact: Speed, Efficiency, and SimplicityGolang's Impact: Speed, Efficiency, and SimplicityApr 14, 2025 am 12:11 AM

Goimpactsdevelopmentpositivelythroughspeed,efficiency,andsimplicity.1)Speed:Gocompilesquicklyandrunsefficiently,idealforlargeprojects.2)Efficiency:Itscomprehensivestandardlibraryreducesexternaldependencies,enhancingdevelopmentefficiency.3)Simplicity:

C   and Golang: When Performance is CrucialC and Golang: When Performance is CrucialApr 13, 2025 am 12:11 AM

C is more suitable for scenarios where direct control of hardware resources and high performance optimization is required, while Golang is more suitable for scenarios where rapid development and high concurrency processing are required. 1.C's advantage lies in its close to hardware characteristics and high optimization capabilities, which are suitable for high-performance needs such as game development. 2.Golang's advantage lies in its concise syntax and natural concurrency support, which is suitable for high concurrency service development.

Golang in Action: Real-World Examples and ApplicationsGolang in Action: Real-World Examples and ApplicationsApr 12, 2025 am 12:11 AM

Golang excels in practical applications and is known for its simplicity, efficiency and concurrency. 1) Concurrent programming is implemented through Goroutines and Channels, 2) Flexible code is written using interfaces and polymorphisms, 3) Simplify network programming with net/http packages, 4) Build efficient concurrent crawlers, 5) Debugging and optimizing through tools and best practices.

Golang: The Go Programming Language ExplainedGolang: The Go Programming Language ExplainedApr 10, 2025 am 11:18 AM

The core features of Go include garbage collection, static linking and concurrency support. 1. The concurrency model of Go language realizes efficient concurrent programming through goroutine and channel. 2. Interfaces and polymorphisms are implemented through interface methods, so that different types can be processed in a unified manner. 3. The basic usage demonstrates the efficiency of function definition and call. 4. In advanced usage, slices provide powerful functions of dynamic resizing. 5. Common errors such as race conditions can be detected and resolved through getest-race. 6. Performance optimization Reuse objects through sync.Pool to reduce garbage collection pressure.

Golang's Purpose: Building Efficient and Scalable SystemsGolang's Purpose: Building Efficient and Scalable SystemsApr 09, 2025 pm 05:17 PM

Go language performs well in building efficient and scalable systems. Its advantages include: 1. High performance: compiled into machine code, fast running speed; 2. Concurrent programming: simplify multitasking through goroutines and channels; 3. Simplicity: concise syntax, reducing learning and maintenance costs; 4. Cross-platform: supports cross-platform compilation, easy deployment.

Why do the results of ORDER BY statements in SQL sorting sometimes seem random?Why do the results of ORDER BY statements in SQL sorting sometimes seem random?Apr 02, 2025 pm 05:24 PM

Confused about the sorting of SQL query results. In the process of learning SQL, you often encounter some confusing problems. Recently, the author is reading "MICK-SQL Basics"...

Is technology stack convergence just a process of technology stack selection?Is technology stack convergence just a process of technology stack selection?Apr 02, 2025 pm 05:21 PM

The relationship between technology stack convergence and technology selection In software development, the selection and management of technology stacks are a very critical issue. Recently, some readers have proposed...

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software