Home  >  Article  >  Web Front-end  >  An in-depth understanding of Worker Threads in Node.js

An in-depth understanding of Worker Threads in Node.js

青灯夜游
青灯夜游forward
2021-06-28 11:25:402981browse

An in-depth understanding of Worker Threads in Node.js

[Recommended study: "nodejs Tutorial"]

Understanding the bottom layer of Node is necessary to understand Workers.

When a Node.js application is started, it will start the following modules:

  • A process
  • A thread
  • Event loop Mechanism
  • JS Engine Example
  • Node.js Example

A process: The process object is a global variable that can be accessed anywhere in the Node.js program. And provide information about the current process.

One thread: Single thread means that only one instruction is executed at the same time in the current process.

Event loop: This is a part of Node.js that needs to be understood. Although JavaScript is single-threaded, it is based on events through the use of callbacks, promises, async/await and other syntaxes. The loop asynchronously asynchronously operates the operating system, allowing Node to have the characteristics of asynchronous non-blocking IO.

A JS engine instance: a program that can run JavaScript code.

A Node.js instance: a program that can run the Node.js environment.

In other words, Node runs on a single thread, and in the event loop only one process task is executed at the same time, and only one piece of code is executed at the same time each time (multiple pieces of code will not be executed at the same time). This is very effective because the mechanism is simple enough that you don't have to worry about concurrent programming when using JavaScript.

The reason for this is that JavaScript was originally used for client-side interaction (such as web page interaction or form validation), and these logics do not require a mechanism such as multi-threading to process.

So this also brings another disadvantage: if you need to use CPU-intensive tasks, such as performing complex calculations using a large data set in memory, it will block the tasks of other processes. Similarly, when you initiate a remote interface request that has a CPU-intensive task, it will also block other requests that need to be executed.

If a function blocks the event loop mechanism until the next function can be executed, it is considered a blocking function. A non-blocking function will not block the event loop for the execution of the next function. It will use a callback to notify the event loop that the task has been completed.

Best practice: Do not block the event loop, keep the event loop running continuously, and be careful to avoid using operations that block the thread such as synchronous network interface calls or infinite loops.

It is important to distinguish between CPU-intensive operations and I/O (input/output)-intensive operations. As mentioned before, Node.js will not execute multiple pieces of code at the same time. Only I/O operations will be executed at the same time because they are asynchronous.

So Worker Threads are not very helpful for I/O-intensive operations, because asynchronous I/O operations are more efficient than workers. The main role of Workers is to improve CPU-intensive operations. performance.

Other solutions

In addition, there are already many solutions for CPU-intensive operations, such as the multi-process (cluster API) solution, which ensures full utilization of multi-core CPUs .

The advantage of this solution is that the processes are independent of each other. If a problem occurs in one process, it will not affect other processes. In addition, they also have a stable API, however, this also means that memory space cannot be shared, and inter-process communication can only occur through data in JSON format.

JavaScript and Node.js will not be multi-threaded for the following reasons:

So, one might think to add a Node.js core module that creates and synchronizes threads This can solve the needs of CPU-intensive operations.

However, if you add a multi-threading module, it will change the characteristics of the language itself. It is not possible to add the multithreading module as an available class or function. In some languages ​​that support multi-threading, such as Java, synchronization features are used to enable synchronization between multiple threads.

And some numeric types are not atomic enough, which means that if you do not operate them synchronously, when multiple threads perform calculations at the same time, the value of the variable may keep changing and there is no definite value. The value of a variable may change a few bytes after calculation by one thread, and change several bytes of data after calculation by another thread. For example, in JavaScript the result of some simple calculation like 0.1 0.2 has 17 decimal digits (the highest number of decimal digits).

var x = 0.1 + 0.2; // x will be 0.30000000000000004

But the calculation of floating point numbers is not 100% accurate. So if the calculations are not synchronized, the decimal part of the number will never be an accurate number due to multiple threads.

Best Practice

So solving the performance problem of CPU-intensive operations is to use Worker Threads. Browsers have had Workers features for a long time.

Node.js under single thread:

  • 一个进程
  • 一个线程
  • 一个事件循环
  • 一个 JS 引擎实例
  • 一个 Node.js 实例

多线程 Workers 下 Node.js 拥有:

  • 一个进程
  • 多个线程
  • 每个线程都拥有独立的事件循环
  • 每个线程都拥有一个 JS 引擎实例
  • 每个线程都拥有一个 Node.js 实例

就像下图:

An in-depth understanding of Worker Threads in Node.js

Worker_threads 模块允许使用多个线程来同时执行 JavaScript 代码。使用下面这个方式引入:

const worker = require('worker_threads');

Worker Threads 已经被添加到 Node.js 10 版本中,但是仍处于实验阶段。

使用 Worker threads 我们可以在在同一个进程内可以拥有多个 Node.js 实例,并且线程可以不需要跟随父进程的终止的时候才被终止,它可以在任意时刻被终止。当 Worker 线程销毁的时候分配给该 Worker 线程的资源依然没有被释放是一个很不好的操作,这会导致内存泄漏问题,我们也不希望这样。我们希望这些分配资源能够嵌入到 Node.js 中,让 Node.js 有创建线程的能力,并且在线程中创建一个新的 Node.js 实例,本质上就像是在同一个进程中运行多个独立的线程。

Worker Threads 有如下特性:

  • ArrayBuffers 可以将内存中的变量从一个线程转到另外一个
  • SharedArrayBuffer 可以在多个线程中共享内存中的变量,但是限制为二进制格式的数据。
  • 可用的原子操作,可以让你更有效率地同时执行某些操作并且实现竞态变量
  • 消息端口,用于多个线程间通信。可以用于多个线程间传输结构化的数据,内存空间
  • 消息通道就像多线程间的一个异步的双向通信通道。
  • WorkerData 是用于传输启动数据。在多个线程间使用 postMessgae 进行传输的时候,数据会被克隆,并将克隆的数据传输到线程的 contructor 中。

API:

  • const { worker, parantPort } = require('worker_threads'); =>worker 函数相当于一个独立的 JavaScript 运行环境线程,parentPort 是消息端口的一个实例
  • new Worker(filename) or new Worker(code, { eval: true }) =>启动 worker 的时候有两种方式,可以通过传输文件路径或者代码,在生产环境中推荐使用文件路径的方式。
  • worker.on('message'),worker.postMessage(data) => 这是多线程间监听事件与推送数据的方式。
  • parentPort.on('message'), parentPort.postMessage(data) => 在线程中使用 parentPort.postMessage 方式推送的数据可以在父进程中使用 worker.on('message') 的方式接收到,在父进程中使用 worker.postMessage() 的方式推送的数据可以在线程中使用 parentPort.on('message') 的方式监听到。

例子

const { Worker } = require('worker_threads');

const worker = new Worker(`
const { parentPort } = require('worker_threads');
parentPort.once('message',
    message => parentPort.postMessage({ pong: message }));  
`, { eval: true });
worker.on('message', message => console.log(message));      
worker.postMessage('ping');
$ node --experimental-worker test.js
{ pong: ‘ping’ }

上面例子所做的也就是使用 new Worker 创建一个线程,线程中的代码监听了 parentPort 的消息,并且当接收到数据的时候只触发一次回调,将收到的数据传输回父进程中。

你需要使用 --experimental-worker 启动程序因为 Workers 还在实验阶段。

另一个例子:

const {
	Worker, isMainThread, parentPort, workerData
} = require('worker_threads');

if (isMainThread) {
    module.exports = function parseJSAsync(script) {
        return new Promise((resolve, reject) => {
        	const worker = new Worker(filename, {
        		workerData: script
    		});
            worker.on('message', resolve);
            worker.on('error', reject);
            worker.on('exit', (code) => {
                if (code !== 0)
                    reject(new Error(`Worker stopped with exit code ${code}`));
            });
         });
    };
} else {
    const { parse } = require('some-js-parsing-library');
    const script = workerData;
    parentPort.postMessage(parse(script));
}

上面代码中:

  • Worker: 相当于一个独立的 JavaScirpt 运行线程。
  • isMainThread: 如果为 true 的话说明代码不是运行在 Worker 线程中
  • parentPort: 消息端口被使用来进行线程间通信
  • workerData:被传入 worker 的 contructor 的克隆数据。

在实际使用中,应该使用线程池的方式,不然不断地创建 worker 线程的代价将会超过它带来的好处。

Recommendations for using Worker:

  • Transfer native handles such as sockets, http requests
  • Deadlock detection. Deadlock is a situation where multiple processes are blocked because each process holds a portion of a resource and is waiting for another process to release the resource it holds. Deadlock detection is a very useful feature in Workers Threads for better isolation, so if one thread is affected, it does not affect other threads.
Some bad thoughts about Workers:

Don’t think that Workers will bring incredible speed improvements. Sometimes using thread pools will is a better choice.
  • Do not use Workers to perform I/O operations in parallel.
  • Don’t think that the cost of creating a Worker process is very low.
Finally

Chrome devTools supports the Workers thread feature in Node.js.

worker_threads

is an experimental module. If you need to run CPU-intensive operations in Node.js, it is currently not recommended to use worker threads in a production environment. You can use a process pool instead.

English original address: https://nodesource.com/blog/worker-threads-nodejs

Author: Liz Parody

More programming For related knowledge, please visit:
programming video

! !

The above is the detailed content of An in-depth understanding of Worker Threads in Node.js. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:juejin.cn. If there is any infringement, please contact admin@php.cn delete