Running DeepSeek-Rn the Browser: A Comprehensive Guide-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

Running DeepSeek-Rn the Browser: A Comprehensive Guide

Mary-Kate Olsen

Jan 23, 2025 pm 10:38 PM

Running DeepSeek-Rn the Browser: A Comprehensive Guide

As artificial intelligence technology continues to develop, it is becoming more and more feasible to run complex machine learning models directly in the browser. This guide walks you through learning how to load and use the DeepSeek-R1 model in your browser using JavaScript. We will also cover implementation details based on the examples provided here.

Why run NLP models in the browser?

Traditionally, natural language processing (NLP) models are deployed server-side and require an internet connection to send requests and receive responses. However, with the advancement of technologies such as WebGPU and ONNX.js, it is now possible to run advanced models such as DeepSeek-R1 directly in the browser. Its advantages include:

Enhanced Privacy: User data never leaves their device.
Reduced Latency: Eliminates delays associated with server communication.
Offline Availability: Works even without an internet connection.

About DeepSeek-R1

DeepSeek-R1 is a lightweight and efficient NLP model optimized for on-device inference. It provides high-quality text processing capabilities while maintaining a small footprint, making it ideal for browser environments.

Set up your project

Prerequisites

To start running the DeepSeek-R1 model in your browser you need:

Modern browsers supporting WebGPU/WebGL.
@huggingface/transformers library for executing transformers models in JavaScript.
Contains script files for loading and processing the DeepSeek-R1 model logic.

Demo: Give it a try!

Implementation details

Here is a step-by-step guide on how to load and use the DeepSeek-R1 model in your browser:

import {
  AutoTokenizer,
  AutoModelForCausalLM,
  TextStreamer,
  InterruptableStoppingCriteria,
} from "@huggingface/transformers";

/**
 * 用于执行 WebGPU 功能检测的辅助函数
 */
async function check() {
  try {
    const adapter = await navigator.gpu.requestAdapter();
    if (!adapter) {
      throw new Error("WebGPU 不受支持（未找到适配器）");
    }
  } catch (e) {
    self.postMessage({
      status: "error",
      data: e.toString(),
    });
  }
}

/**
 * 此类使用单例模式来启用模型的延迟加载
 */
class TextGenerationPipeline {
  static model_id = "onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX";

  static async getInstance(progress_callback = null) {
    if (!this.tokenizer) {
      this.tokenizer = await AutoTokenizer.from_pretrained(this.model_id, {
        progress_callback,
      });
    }

    if (!this.model) {
      this.model = await AutoModelForCausalLM.from_pretrained(this.model_id, {
        dtype: "q4f16",
        device: "webgpu",
        progress_callback,
      });
    }

    return [this.tokenizer, this.model];
  }
}

const stopping_criteria = new InterruptableStoppingCriteria();

let past_key_values_cache = null;

async function generate(messages) {
  // 获取文本生成管道。
  const [tokenizer, model] = await TextGenerationPipeline.getInstance();

  const inputs = tokenizer.apply_chat_template(messages, {
    add_generation_prompt: true,
    return_dict: true,
  });

  const [START_THINKING_TOKEN_ID, END_THINKING_TOKEN_ID] = tokenizer.encode(
    "<think></think>",
    { add_special_tokens: false },
  );

  let state = "thinking"; // 'thinking' 或 'answering'
  let startTime;
  let numTokens = 0;
  let tps;

  const token_callback_function = (tokens) => {
    startTime ??= performance.now();

    if (numTokens++ > 0) {
      tps = (numTokens / (performance.now() - startTime)) * 1000;
    }
    if (tokens[0] === END_THINKING_TOKEN_ID) {
      state = "answering";
    }
  };

  const callback_function = (output) => {
    self.postMessage({
      status: "update",
      output,
      tps,
      numTokens,
      state,
    });
  };

  const streamer = new TextStreamer(tokenizer, {
    skip_prompt: true,
    skip_special_tokens: true,
    callback_function,
    token_callback_function,
  });

  // 通知主线程我们已开始
  self.postMessage({ status: "start" });

  const { past_key_values, sequences } = await model.generate({
    ...inputs,
    do_sample: false,
    max_new_tokens: 2048,
    streamer,
    stopping_criteria,
    return_dict_in_generate: true,
  });

  past_key_values_cache = past_key_values;

  const decoded = tokenizer.batch_decode(sequences, {
    skip_special_tokens: true,
  });

  // 将输出发送回主线程
  self.postMessage({
    status: "complete",
    output: decoded,
  });
}

async function load() {
  self.postMessage({
    status: "loading",
    data: "正在加载模型...",
  });

  // 加载管道并将其保存以供将来使用。
  const [tokenizer, model] = await TextGenerationPipeline.getInstance((x) => {
    self.postMessage(x);
  });

  self.postMessage({
    status: "loading",
    data: "正在编译着色器并预热模型...",
  });

  // 使用虚拟输入运行模型以编译着色器
  const inputs = tokenizer("a");
  await model.generate({ ...inputs, max_new_tokens: 1 });
  self.postMessage({ status: "ready" });
}

// 监听来自主线程的消息
self.addEventListener("message", async (e) => {
  const { type, data } = e.data;

  switch (type) {
    case "check":
      check();
      break;

    case "load":
      load();
      break;

    case "generate":
      stopping_criteria.reset();
      generate(data);
      break;

    case "interrupt":
      stopping_criteria.interrupt();
      break;

    case "reset":
      past_key_values_cache = null;
      stopping_criteria.reset();
      break;
  }
});

Key points

Feature Detection: The check function performs feature detection to ensure WebGPU support.
Single case mode: The TextGenerationPipeline class ensures that the tokenizer and model are loaded only once, avoiding redundant initialization.
Model loading: The getInstance method loads the tokenizer and model from pre-trained sources and supports progress callbacks.
Inference: The generate function processes input and produces text output, using the TextStreamer streaming tag.
Communication: The worker thread listens for messages from the main thread and performs corresponding actions based on the message type (for example, "check", "load", "generate", "interrupt", "reset") operate.

Conclusion

Running NLP models like DeepSeek-R1 in the browser marks significant progress in enhancing user experience and protecting data privacy. With just a few lines of JavaScript code and the power of the @huggingface/transformers library, you can develop responsive and powerful applications. Whether you’re building interactive tools or intelligent assistants, browser-based NLP has the potential to be a game-changer.

Explore the potential of DeepSeek-R1 in the browser and start creating smarter front-end applications today!

This guide provides a comprehensive overview of how to load and use the DeepSeek-R1 model in a browser environment, with detailed code examples. For more specific implementation details, please refer to the linked GitHub repository.

This revised output maintains the original image and its format, rephrases sentences, and uses synonyms to achieve pseudo-originality while preserving the original meaning. The code block is unchanged as it's not considered text for rewriting purposes in this context.

The above is the detailed content of Running DeepSeek-Rn the Browser: A Comprehensive Guide. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

JavaScript Comments: A Guide to Using // and /* */May 13, 2025 pm 03:49 PM

JavaScriptusestwotypesofcomments:single-line(//)andmulti-line(//).1)Use//forquicknotesorsingle-lineexplanations.2)Use//forlongerexplanationsorcommentingoutblocksofcode.Commentsshouldexplainthe'why',notthe'what',andbeplacedabovetherelevantcodeforclari

Python vs. JavaScript: A Comparative Analysis for DevelopersMay 09, 2025 am 12:22 AM

The main difference between Python and JavaScript is the type system and application scenarios. 1. Python uses dynamic types, suitable for scientific computing and data analysis. 2. JavaScript adopts weak types and is widely used in front-end and full-stack development. The two have their own advantages in asynchronous programming and performance optimization, and should be decided according to project requirements when choosing.

Python vs. JavaScript: Choosing the Right Tool for the JobMay 08, 2025 am 12:10 AM

Whether to choose Python or JavaScript depends on the project type: 1) Choose Python for data science and automation tasks; 2) Choose JavaScript for front-end and full-stack development. Python is favored for its powerful library in data processing and automation, while JavaScript is indispensable for its advantages in web interaction and full-stack development.

Python and JavaScript: Understanding the Strengths of EachMay 06, 2025 am 12:15 AM

Python and JavaScript each have their own advantages, and the choice depends on project needs and personal preferences. 1. Python is easy to learn, with concise syntax, suitable for data science and back-end development, but has a slow execution speed. 2. JavaScript is everywhere in front-end development and has strong asynchronous programming capabilities. Node.js makes it suitable for full-stack development, but the syntax may be complex and error-prone.

JavaScript's Core: Is It Built on C or C ?May 05, 2025 am 12:07 AM

JavaScriptisnotbuiltonCorC ;it'saninterpretedlanguagethatrunsonenginesoftenwritteninC .1)JavaScriptwasdesignedasalightweight,interpretedlanguageforwebbrowsers.2)EnginesevolvedfromsimpleinterpreterstoJITcompilers,typicallyinC ,improvingperformance.

JavaScript Applications: From Front-End to Back-EndMay 04, 2025 am 12:12 AM

JavaScript can be used for front-end and back-end development. The front-end enhances the user experience through DOM operations, and the back-end handles server tasks through Node.js. 1. Front-end example: Change the content of the web page text. 2. Backend example: Create a Node.js server.

Python vs. JavaScript: Which Language Should You Learn?May 03, 2025 am 12:10 AM

Choosing Python or JavaScript should be based on career development, learning curve and ecosystem: 1) Career development: Python is suitable for data science and back-end development, while JavaScript is suitable for front-end and full-stack development. 2) Learning curve: Python syntax is concise and suitable for beginners; JavaScript syntax is flexible. 3) Ecosystem: Python has rich scientific computing libraries, and JavaScript has a powerful front-end framework.

JavaScript Frameworks: Powering Modern Web DevelopmentMay 02, 2025 am 12:04 AM

The power of the JavaScript framework lies in simplifying development, improving user experience and application performance. When choosing a framework, consider: 1. Project size and complexity, 2. Team experience, 3. Ecosystem and community support.

See all articles