Home >Web Front-end >JS Tutorial >Running Kokoro- ONNX TTS Model in the Browser

Running Kokoro- ONNX TTS Model in the Browser

Linda Hamilton
Linda HamiltonOriginal
2025-01-17 14:31:10927browse

Running Kokoro- ONNX TTS Model in the Browser

Advances in artificial intelligence and machine learning have significantly expanded the boundaries of what is possible within the browser. Running text-to-speech (TTS) models directly in the browser opens new opportunities for privacy, speed, and convenience. In this blog post, we will explore how to run the Kokoro-82M ONNX TTS model in a browser using a JavaScript implementation. If you’re curious, you can test it out in my demo: Kitt AI Text-to-Speech .

Why run TTS models in the browser?

Traditionally, TTS models are executed on a server and require an internet connection to send input and receive synthesized speech. However, with the enhancements to WebGPU and ONNX.js, you can now run advanced models like Kokoro-82M ONNX directly in the browser. This brings many advantages:

  • Privacy: Your text data never leaves your device.
  • Low Latency: Eliminate server communication delays.
  • Offline Access: Works even without an active internet connection.

Kokoro-82M ONNX Overview

The Kokoro-82M ONNX model is a lightweight yet effective TTS model optimized for on-device inference. It provides high-quality speech synthesis while maintaining a small footprint, making it suitable for browser environments.

Project settings

Prerequisites

To run Kokoro-82M ONNX in your browser you need:

  1. Modern browsers with WebGPU/WebGL support.
  2. The ONNX.js library for running ONNX models in JavaScript.
  3. Kokoro.js script, which simplifies the loading and processing of Kokoro-82M models.

Installation

You can set up your project by including the necessary dependencies in package.json:

<code>{
  "dependencies": {
    "@huggingface/transformers": "^3.3.1"
  }
}</code>

Next, make sure you have the Kokoro.js script, which is available from this repository.

Model loading

To load and use the Kokoro-82M ONNX model in your browser, follow these steps:

<code class="language-javascript">this.model_instance = StyleTextToSpeech2Model.from_pretrained(
    this.modelId,
    {
        device: "wasm",
        progress_callback,
    }
);
this.tokenizer = AutoTokenizer.from_pretrained(this.modelId, {
   progress_callback,
});</code>

Run inference

After loading the model and processing the text, you can run inference to generate speech:

<code class="language-javascript">const language = speakerId.at(0); // "a" 或 "b"
const phonemes = await phonemize(text, language);
const { input_ids } = await tokenizer(phonemes, { truncation: true });
const num_tokens = Math.max(
   input_ids.dims.at(-1) - 2, // 无填充;
   0
);
const offset = num_tokens * STYLE_DIM;
const data = await getVoiceData(speakerId as keyof typeof VOICES);
const voiceData = data.slice(offset, offset + STYLE_DIM);
const inputs = {
   input_ids,
   style: new Tensor("float32", voiceData, [1, STYLE_DIM]),
   speed: new Tensor("float32", [speed], [1]),
};

const { waveform } = await model(inputs);
const audio = new RawAudio(waveform.data, SAMPLE_RATE).toBlob();</code>

Demo

You can see this in my live demo: Kitt AI Text to Speech. This demo showcases real-time text-to-speech synthesis powered by Kokoro-82M ONNX.

Conclusion

Running TTS models like the Kokoro-82M ONNX in the browser represents a leap forward for privacy-preserving and low-latency applications. With just a few lines of JavaScript code and the power of ONNX.js, you can create high-quality, responsive TTS applications that delight your users. Whether you're building accessibility tools, voice assistants, or interactive applications, in-browser TTS could be a game-changer.

Try the Kitt AI text-to-speech demo now and see for yourself!

References

  1. Hugging Face Transformers.js Documentation
  2. ModNet Model
  3. WebGPU API
  4. React Documentation
  5. Reference Code

The above is the detailed content of Running Kokoro- ONNX TTS Model in the Browser. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn