Home >Web Front-end >JS Tutorial >An in-depth analysis of processes and threads in Node.js

An in-depth analysis of processes and threads in Node.js

青灯夜游
青灯夜游forward
2021-05-11 11:03:482348browse

This article will give you an in-depth understanding of processes and threads in Node.js. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to everyone.

An in-depth analysis of processes and threads in Node.js

Process and Thread are concepts that programmers must know. They are often asked in interviews, but some articles only Talking about theoretical knowledge, some friends may not really understand it, and there are relatively few applications in actual development. In addition to introducing concepts, this article explains process and thread from the perspective of Node.js, and explains some practical applications in projects, so that you can not only face the interviewer but also Perfect application in actual combat. [Recommended learning: "nodejs Tutorial"]

Article Guide

An in-depth analysis of processes and threads in Node.js

Interview Meeting Question

Is Node.js single-threaded?

Node.js How to avoid blocking when doing time-consuming calculations?

How does Node.js realize the opening and closing of multiple processes?

Can Node.js create threads?

How do you implement process guarding during development?

In addition to using third-party modules, have you encapsulated a multi-process architecture?

Process

ProcessProcess is a running activity of a program in the computer on a certain data set. It is the basic unit of resource allocation and scheduling in the system. It is the basis of the operating system structure. The process is the container of threads (from encyclopedia) . A process is the smallest unit of resource allocation. When we start a service and run an instance, we open a service process. For example, the JVM in Java itself is a process. In Node.js, a service process is started through node app.js. Multiple processes are processes. Copy (fork). Each process that comes out of the fork has its own independent space address and data stack. One process cannot access variables and data structures defined in another process. Only when IPC communication is established can data be shared between processes. .

  • Node.js example of starting a service process
const http = require('http');

const server = http.createServer();
server.listen(3000,()=>{
    process.title='程序员成长指北测试进程';
    console.log('进程id',process.pid)
})

After running the above code, the following is the effect displayed by the monitoring tool "Activity Monitor" that comes with the Mac system. You can see the Nodejs process we just started 7663

An in-depth analysis of processes and threads in Node.js

Thread

Thread is the smallest unit that the operating system can perform calculation scheduling. First of all, we must understand that threads belong to processes and are included in processes. A thread can only belong to one process, but a process can have multiple threads.

Single-threaded

Single-threaded is a process that only opens one thread

Javascript is single-threaded, and the program is executed sequentially (Let’s not mention JS asynchronous here for now), you can imagine a queue. After the previous one is executed, the next one can be executed. When you are coding in a single-threaded language, do not have too many time-consuming synchronization operations, otherwise the thread will cause blocking. , causing subsequent responses to be unable to be processed. If you use Javascript for coding, please make use of the asynchronous operation features of Javascript as much as possible.

Example of classic calculation time-consuming and thread blocking

const http = require('http');
const longComputation = () => {
  let sum = 0;
  for (let i = 0; i < 1e10; i++) {
    sum += i;
  };
  return sum;
};
const server = http.createServer();
server.on(&#39;request&#39;, (req, res) => {
  if (req.url === &#39;/compute&#39;) {
    console.info(&#39;计算开始&#39;,new Date());
    const sum = longComputation();
    console.info(&#39;计算结束&#39;,new Date());
    return res.end(`Sum is ${sum}`);
  } else {
    res.end(&#39;Ok&#39;)
  }
});

server.listen(3000);
//打印结果
//计算开始 2019-07-28T07:08:49.849Z
//计算结束 2019-07-28T07:09:04.522Z

View the print results when we call 127.0.0.1:3000/compute , if you want to call other routing addresses such as 127.0.0.1/, it will take about 15 seconds. It can also be said that a user needs to wait 15 seconds after requesting the first compute interface. This is Extremely unfriendly. Below I will solve this problem by creating multiple processes child_process.fork and cluster.

Some instructions on single-threading

  • Although Node.js is a single-threaded model, it is based on event-driven, asynchronous non-blocking mode and can be applied to high-concurrency scenarios. It avoids the resource overhead caused by thread creation and context switching between threads.
  • When your project requires a lot of calculations and CPU-consuming operations, you should consider starting multiple processes to complete it.
  • During the Node.js development process, errors will cause the entire application to exit. The robustness of the application is worth testing, especially error exceptions and process daemons must be done.
  • Single thread cannot take advantage of multi-core CPU, but the API provided by Node.js and some third-party tools have been solved accordingly, which will be discussed later in the article.

Processes and threads in Node.js

Node.js is the running environment for Javascript on the server side. It is built on chrome's V8 engine and is based on The event-driven, non-blocking I/O model makes full use of the asynchronous I/O provided by the operating system to perform multi-tasking, and is suitable for I/O-intensive application scenarios. Because of asynchronous, the program does not need to block waiting for the result to be returned, but is based on The callback notification mechanism, the original waiting time in synchronization mode, can be used to handle other tasks,

Popular science: In terms of web servers, the famous Nginx also adopts this mode (event-driven), which avoids Nginx is written in C language due to the overhead of multi-thread thread creation and thread context switching. It is mainly used for high-performance web servers and is not suitable for business.

In Web business development, if you have high-concurrency application scenarios, Node.js will be a good choice for you.

On a single-core CPU system we adopt a single-process single-thread mode for development. On a multi-core CPU system, you can start multiple processes through child_process.fork (Node.js added Cluster after version v0.8 to implement multi-process architecture), that is, multi-process single-thread mode. Note: Enabling multi-process is not to solve high concurrency. It is mainly to solve the problem of insufficient CPU utilization of Node.js in single-process mode and make full use of the performance of multi-core CPU.

The process in Node.js

process module

The process in Node.js Process is a global object. Used directly without require, it provides us with relevant information in the current process. The official documentation provides detailed instructions. If you are interested, you can practice it yourself in the Process documentation.

  • process.env: Environment variables, for example, through process.env.NODE_ENV to obtain configuration information of different environment projects
  • process.nextTick: This is often mentioned when talking about Event Loop
  • process.pid: Get the current process id
  • process.ppid: The parent process corresponding to the current process
  • process.cwd(): Get the working directory of the current process,
  • process.platform: Get the operating system platform on which the current process is running
  • process.uptime(): The running time of the current process, for example: the uptime value of the pm2 daemon
  • Process events: process.on('uncaughtException', cb) Capture exception information, process.on('exit', cb) Process launch monitoring
  • Three standard streams: process.stdout standard output, process.stdin standard input, process.stderr standard error output
  • process.title Specify the process name. Sometimes you need to specify a name for the process

The above only lists some commonly used function points. In addition to Process, Node.js also provides child_process The module is used to operate the child process. Nodejs process creation will be described below.

Node.js process creation

There are many ways to create a process. This article uses the child_process module and cluster module to explain.

child_process module

child_process is a built-in module of Node.js, official website address:

child_process official website address: http://nodejs .cn/api/child_process.html#child_process_child_process

Several common functions: Four ways

  • child_process.spawn():适用于返回大量数据,例如图像处理,二进制数据处理。
  • child_process.exec():适用于小量数据,maxBuffer 默认值为 200 * 1024 超出这个默认值将会导致程序崩溃,数据量过大可采用 spawn。
  • child_process.execFile():类似 child_process.exec(),区别是不能通过 shell 来执行,不支持像 I/O 重定向和文件查找这样的行为
  • child_process.fork(): 衍生新的进程,进程之间是相互独立的,每个进程都有自己的 V8 实例、内存,系统资源是有限的,不建议衍生太多的子进程出来,通长根据系统** CPU 核心数**设置。

CPU 核心数这里特别说明下,fork 确实可以开启多个进程,但是并不建议衍生出来太多的进程,cpu核心数的获取方式const cpus = require('os').cpus();,这里 cpus 返回一个对象数组,包含所安装的每个 CPU/内核的信息,二者总和的数组哦。假设主机装有两个cpu,每个cpu有4个核,那么总核数就是8。

fork开启子进程 Demo

fork开启子进程解决文章起初的计算耗时造成线程阻塞。 在进行 compute 计算时创建子进程,子进程计算完成通过 send 方法将结果发送给主进程,主进程通过 message 监听到信息后处理并退出。

fork_app.js

const http = require(&#39;http&#39;);
const fork = require(&#39;child_process&#39;).fork;

const server = http.createServer((req, res) => {
    if(req.url == &#39;/compute&#39;){
        const compute = fork(&#39;./fork_compute.js&#39;);
        compute.send(&#39;开启一个新的子进程&#39;);

        // 当一个子进程使用 process.send() 发送消息时会触发 &#39;message&#39; 事件
        compute.on(&#39;message&#39;, sum => {
            res.end(`Sum is ${sum}`);
            compute.kill();
        });

        // 子进程监听到一些错误消息退出
        compute.on(&#39;close&#39;, (code, signal) => {
            console.log(`收到close事件,子进程收到信号 ${signal} 而终止,退出码 ${code}`);
            compute.kill();
        })
    }else{
        res.end(`ok`);
    }
});
server.listen(3000, 127.0.0.1, () => {
    console.log(`server started at http://${127.0.0.1}:${3000}`);
});

fork_compute.js

针对文初需要进行计算的的例子我们创建子进程拆分出来单独进行运算。

const computation = () => {
    let sum = 0;
    console.info(&#39;计算开始&#39;);
    console.time(&#39;计算耗时&#39;);

    for (let i = 0; i < 1e10; i++) {
        sum += i
    };

    console.info(&#39;计算结束&#39;);
    console.timeEnd(&#39;计算耗时&#39;);
    return sum;
};

process.on(&#39;message&#39;, msg => {
    console.log(msg, &#39;process.pid&#39;, process.pid); // 子进程id
    const sum = computation();

    // 如果Node.js进程是通过进程间通信产生的,那么,process.send()方法可以用来给父进程发送消息
    process.send(sum);
})
cluster模块

cluster 开启子进程Demo

const http = require('http');
const numCPUs = require('os').cpus().length;
const cluster = require('cluster');
if(cluster.isMaster){
    console.log('Master proces id is',process.pid);
    // fork workers
    for(let i= 0;i<numcpus><h6 data-id="heading-15">cluster原理分析</h6>
<p><img src="https://img.php.cn/upload/image/919/390/925/162070174810870An%20in-depth%20analysis%20of%20processes%20and%20threads%20in%20Node.js" title="162070174810870An in-depth analysis of processes and threads in Node.js" alt="An in-depth analysis of processes and threads in Node.js"></p>
<p>cluster模块调用fork方法来创建子进程,该方法与child_process中的fork是同一个方法。
cluster模块采用的是经典的主从模型,Cluster会创建一个master,然后根据你指定的数量复制出多个子进程,可以使用<code>cluster.isMaster</code>属性判断当前进程是master还是worker(工作进程)。由master进程来管理所有的子进程,主进程不负责具体的任务处理,主要工作是负责调度和管理。</p>
<p>cluster模块使用内置的负载均衡来更好地处理线程之间的压力,该负载均衡使用了<code>Round-robin</code>算法(也被称之为循环算法)。当使用Round-robin调度策略时,master accepts()所有传入的连接请求,然后将相应的TCP请求处理发送给选中的工作进程(该方式仍然通过IPC来进行通信)。</p>
<p>开启多进程时候端口疑问讲解:如果多个Node进程监听同一个端口时会出现 <code>Error:listen EADDRIUNS</code>的错误,而cluster模块为什么可以让多个子进程监听同一个端口呢?原因是master进程内部启动了一个TCP服务器,而真正监听端口的只有这个服务器,当来自前端的请求触发服务器的connection事件后,master会将对应的socket具柄发送给子进程。</p>
<h5 data-id="heading-16">child_process 模块与cluster 模块总结</h5>
<p>无论是 child_process 模块还是 cluster 模块,为了解决 Node.js 实例单线程运行,无法利用多核 CPU 的问题而出现的。核心就是<strong>父进程(即 master 进程)负责监听端口,接收到新的请求后将其分发给下面的 worker 进程</strong>。</p>
<p>cluster模块的一个弊端:</p>
<p><img src="https://img.php.cn/upload/image/984/507/923/162070176214860An%20in-depth%20analysis%20of%20processes%20and%20threads%20in%20Node.js" title="162070176214860An in-depth analysis of processes and threads in Node.js" alt="An in-depth analysis of processes and threads in Node.js"></p>
<p><img src="https://img.php.cn/upload/image/788/562/123/162070176626228An%20in-depth%20analysis%20of%20processes%20and%20threads%20in%20Node.js" title="162070176626228An in-depth analysis of processes and threads in Node.js" alt="An in-depth analysis of processes and threads in Node.js"></p>
<p>cluster内部隐时的构建TCP服务器的方式来说对使用者确实简单和透明了很多,但是这种方式无法像使用child_process那样灵活,因为一直主进程只能管理一组相同的工作进程,而自行通过child_process来创建工作进程,一个主进程可以控制多组进程。原因是child_process操作子进程时,可以隐式的创建多个TCP服务器,对比上面的两幅图应该能理解我说的内容。</p>
<h4 data-id="heading-17"><strong>Node.js进程通信原理</strong></h4>
<p>前面讲解的无论是child_process模块,还是cluster模块,都需要主进程和工作进程之间的通信。通过fork()或者其他API,创建了子进程之后,为了实现父子进程之间的通信,父子进程之间才能通过message和send()传递信息。</p>
<p>IPC这个词我想大家并不陌生,不管那一张开发语言只要提到进程通信,都会提到它。IPC的全称是Inter-Process Communication,即进程间通信。它的目的是为了让不同的进程能够互相访问资源并进行协调工作。实现进程间通信的技术有很多,如命名管道,匿名管道,socket,信号量,共享内存,消息队列等。Node中实现IPC通道是依赖于libuv。windows下由命名管道(name pipe)实现,*nix系统则采用Unix Domain Socket实现。表现在应用层上的进程间通信只有简单的message事件和send()方法,接口十分简洁和消息化。</p>
<p>IPC创建和实现示意图</p>
<p><img src="https://img.php.cn/upload/image/350/147/223/162070177651462An%20in-depth%20analysis%20of%20processes%20and%20threads%20in%20Node.js" title="162070177651462An in-depth analysis of processes and threads in Node.js" alt="An in-depth analysis of processes and threads in Node.js"></p>
<p>IPC通信管道是如何创建的</p>
<p><img src="https://img.php.cn/upload/image/156/288/885/162070178420660An%20in-depth%20analysis%20of%20processes%20and%20threads%20in%20Node.js" title="162070178420660An in-depth analysis of processes and threads in Node.js" alt="An in-depth analysis of processes and threads in Node.js"></p>
<p>父进程在实际创建子进程之前,会创建<code>IPC通道</code>并监听它,然后才<code>真正的</code>创建出<code>子进程</code>,这个过程中也会通过环境变量(NODE_CHANNEL_FD)告诉子进程这个IPC通道的文件描述符。子进程在启动的过程中,根据文件描述符去连接这个已存在的IPC通道,从而完成父子进程之间的连接。</p>
<h4 data-id="heading-18"><strong>Node.js句柄传递</strong></h4>
<p>讲句柄之前,先想一个问题,send句柄发送的时候,真的是将服务器对象发送给了子进程?</p>
<h5 data-id="heading-19"><strong>子进程对象send()方法可以发送的句柄类型</strong></h5>
<ul>
<li>net.Socket TCP套接字</li>
<li>net.Server TCP服务器,任意建立在TCP服务上的应用层服务都可以享受它带来的好处</li>
<li>net.Native C++层面的TCP套接字或IPC管道</li>
<li>dgram.Socket UDP套接字</li>
<li>dgram.Native C++层面的UDP套接字</li>
</ul>
<h5 data-id="heading-20"><strong>send句柄发送原理分析</strong></h5>
<p>结合句柄的发送与还原示意图更容易理解。</p>
<p><img src="https://img.php.cn/upload/image/603/327/625/162070181980449An%20in-depth%20analysis%20of%20processes%20and%20threads%20in%20Node.js" title="162070181980449An in-depth analysis of processes and threads in Node.js" alt="An in-depth analysis of processes and threads in Node.js"></p>
<p><code>send()</code>方法在将消息发送到IPC管道前,实际将消息组装成了两个对象,一个参数是hadler,另一个是message。message参数如下所示:</p>
<pre class="brush:php;toolbar:false">{
    cmd:'NODE_HANDLE',
    type:'net.Server',
    msg:message
}

发送到IPC管道中的实际上是我们要发送的句柄文件描述符。这个message对象在写入到IPC管道时,也会通过JSON.stringfy()进行序列化。所以最终发送到IPC通道中的信息都是字符串,send()方法能发送消息和句柄并不意味着它能发送任何对象。

连接了IPC通道的子线程可以读取父进程发来的消息,将字符串通过JSON.parse()解析还原为对象后,才触发message事件将消息传递给应用层使用。在这个过程中,消息对象还要被进行过滤处理,message.cmd的值如果以NODE_为前缀,它将响应一个内部事件internalMessage,如果message.cmd值为NODE_HANDLE,它将取出message.type值和得到的文件描述符一起还原出一个对应的对象。

以发送的TCP服务器句柄为例,子进程收到消息后的还原过程代码如下:

function(message,handle,emit){
    var self = this;
    
    var server = new net.Server();
    server.listen(handler,function(){
      emit(server);
    });
}

这段还原代码,子进程根据message.type创建对应的TCP服务器对象,然后监听到文件描述符上。由于底层细节不被应用层感知,所以子进程中,开发者会有一种服务器对象就是从父进程中直接传递过来的错觉。

Node进程之间只有消息传递,不会真正的传递对象,这种错觉是抽象封装的结果。目前Node只支持我前面提到的几种句柄,并非任意类型的句柄都能在进程之间传递,除非它有完整的发送和还原的过程。

Node.js多进程架构模型

我们自己实现一个多进程架构守护Demo

An in-depth analysis of processes and threads in Node.js

编写主进程

master.js 主要处理以下逻辑:

  • 创建一个 server 并监听 3000 端口。
  • 根据系统 cpus 开启多个子进程
  • 通过子进程对象的 send 方法发送消息到子进程进行通信
  • 在主进程中监听了子进程的变化,如果是自杀信号重新启动一个工作进程。
  • 主进程在监听到退出消息的时候,先退出子进程在退出主进程
// master.js
const fork = require(&#39;child_process&#39;).fork;
const cpus = require(&#39;os&#39;).cpus();

const server = require(&#39;net&#39;).createServer();
server.listen(3000);
process.title = &#39;node-master&#39;

const workers = {};
const createWorker = () => {
    const worker = fork(&#39;worker.js&#39;)
    worker.on(&#39;message&#39;, function (message) {
        if (message.act === &#39;suicide&#39;) {
            createWorker();
        }
    })
    worker.on(&#39;exit&#39;, function(code, signal) {
        console.log(&#39;worker process exited, code: %s signal: %s&#39;, code, signal);
        delete workers[worker.pid];
    });
    worker.send(&#39;server&#39;, server);
    workers[worker.pid] = worker;
    console.log(&#39;worker process created, pid: %s ppid: %s&#39;, worker.pid, process.pid);
}

for (let i=0; i<cpus.length; i++) {
    createWorker();
}

process.once(&#39;SIGINT&#39;, close.bind(this, &#39;SIGINT&#39;)); // kill(2) Ctrl-C
process.once(&#39;SIGQUIT&#39;, close.bind(this, &#39;SIGQUIT&#39;)); // kill(3) Ctrl-\
process.once(&#39;SIGTERM&#39;, close.bind(this, &#39;SIGTERM&#39;)); // kill(15) default
process.once(&#39;exit&#39;, close.bind(this));

function close (code) {
    console.log(&#39;进程退出!&#39;, code);

    if (code !== 0) {
        for (let pid in workers) {
            console.log(&#39;master process exited, kill worker pid: &#39;, pid);
            workers[pid].kill(&#39;SIGINT&#39;);
        }
    }

    process.exit(0);
}

工作进程

worker.js 子进程处理逻辑如下:

  • 创建一个 server 对象,注意这里最开始并没有监听 3000 端口
  • 通过 message 事件接收主进程 send 方法发送的消息
  • 监听 uncaughtException 事件,捕获未处理的异常,发送自杀信息由主进程重建进程,子进程在链接关闭之后退出
// worker.js
const http = require(&#39;http&#39;);
const server = http.createServer((req, res) => {
	res.writeHead(200, {
		&#39;Content-Type&#39;: &#39;text/plan&#39;
	});
	res.end(&#39;I am worker, pid: &#39; + process.pid + &#39;, ppid: &#39; + process.ppid);
	throw new Error(&#39;worker process exception!&#39;); // 测试异常进程退出、重启
});

let worker;
process.title = &#39;node-worker&#39;
process.on(&#39;message&#39;, function (message, sendHandle) {
	if (message === &#39;server&#39;) {
		worker = sendHandle;
		worker.on(&#39;connection&#39;, function(socket) {
			server.emit(&#39;connection&#39;, socket);
		});
	}
});

process.on(&#39;uncaughtException&#39;, function (err) {
	console.log(err);
	process.send({act: &#39;suicide&#39;});
	worker.close(function () {
		process.exit(1);
	})
})

Node.js 进程守护

什么是进程守护?

每次启动 Node.js 程序都需要在命令窗口输入命令 node app.js 才能启动,但如果把命令窗口关闭则Node.js 程序服务就会立刻断掉。除此之外,当我们这个  Node.js 服务意外崩溃了就不能自动重启进程了。这些现象都不是我们想要看到的,所以需要通过某些方式来守护这个开启的进程,执行 node app.js 开启一个服务进程之后,我还可以在这个终端上做些别的事情,且不会相互影响。,当出现问题可以自动重启。

如何实现进程守护

这里我只说一些第三方的进程守护框架,pm2 和 forever ,它们都可以实现进程守护,底层也都是通过上面讲的 child_process 模块和 cluster 模块 实现的,这里就不再提它们的原理。

pm2 指定生产环境启动一个名为 test 的 node 服务

pm2 start app.js --env production --name test

pm2常用api

  • pm2 stop Name/processID 停止某个服务,通过服务名称或者服务进程ID

  • pm2 delete Name/processID 删除某个服务,通过服务名称或者服务进程ID

  • pm2 logs [Name] 查看日志,如果添加服务名称,则指定查看某个服务的日志,不加则查看所有日志

  • pm2 start app.js -i 4 集群,-i 参数用来告诉PM2以cluster_mode的形式运行你的app(对应的叫fork_mode),后面的数字表示要启动的工作线程的数量。如果给定的数字为0,PM2则会根据你CPU核心的数量来生成对应的工作线程。注意一般在生产环境使用cluster_mode模式,测试或者本地环境一般使用fork模式,方便测试到错误。

  • pm2 reload Name pm2 restart Name 应用程序代码有更新,可以用重载来加载新代码,也可以用重启来完成,reload可以做到0秒宕机加载新的代码,restart则是重新启动,生产环境中多用reload来完成代码更新!

  • pm2 show Name 查看服务详情

  • pm2 list 查看pm2中所有项目

  • pm2 monit用monit可以打开实时监视器去查看资源占用情况

pm2 官网地址:

pm2.keymetrics.io/docs/usage/…

forever 就不特殊说明了,官网地址

github.com/foreverjs/f…

注意:二者更推荐pm2,看一下二者对比就知道我为什么更推荐使用pm2了。www.jianshu.com/p/fdc12d82b…

linux 关闭一个进程

  • 查找与进程相关的PID号

    ps aux | grep server
    说明:

    root     20158  0.0  5.0 1251592 95396 ?       Sl   5月17   1:19 node /srv/mini-program-api/launch_pm2.js
上面是执行命令后在linux中显示的结果,第二个参数就是进程对应的PID
  • 杀死进程
  1. 以优雅的方式结束进程

    kill -l PID

    -l选项告诉kill命令用好像启动进程的用户已注销的方式结束进程。 当使用该选项时,kill命令也试图杀死所留下的子进程。 但这个命令也不是总能成功--或许仍然需要先手工杀死子进程,然后再杀死父进程。

  2. kill 命令用于终止进程

例如: `kill -9 [PID]`

-9 表示强迫进程立即停止

这个强大和危险的命令迫使进程在运行时突然终止,进程在结束后不能自我清理。
危害是导致系统资源无法正常释放,一般不推荐使用,除非其他办法都无效。
当使用此命令时,一定要通过ps -ef确认没有剩下任何僵尸进程。
只能通过终止父进程来消除僵尸进程。如果僵尸进程被init收养,问题就比较严重了。
杀死init进程意味着关闭系统。
如果系统中有僵尸进程,并且其父进程是init,
而且僵尸进程占用了大量的系统资源,那么就需要在某个时候重启机器以清除进程表了。
  1. killall命令

    杀死同一进程组内的所有进程。其允许指定要终止的进程的名称,而非PID。

    killall httpd

Node.js 线程

Node.js关于单线程的误区

const http = require('http');

const server = http.createServer();
server.listen(3000,()=>{
    process.title='程序员成长指北测试进程';
    console.log('进程id',process.pid)
})

仍然看本文第一段代码,创建了http服务,开启了一个进程,都说了Node.js是单线程,所以 Node 启动后线程数应该为 1,但是为什么会开启7个线程呢?难道Javascript不是单线程不知道小伙伴们有没有这个疑问?

解释一下这个原因:

Node 中最核心的是 v8 引擎,在 Node 启动后,会创建 v8 的实例,这个实例是多线程的。

  • 主线程:编译、执行代码。
  • 编译/优化线程:在主线程执行的时候,可以优化代码。
  • 分析器线程:记录分析代码运行时间,为 Crankshaft 优化代码执行提供依据。
  • 垃圾回收的几个线程。

所以大家常说的 Node 是单线程的指的是 JavaScript 的执行是单线程的(开发者编写的代码运行在单线程环境中),但 Javascript 的宿主环境,无论是 Node 还是浏览器都是多线程的因为libuv中有线程池的概念存在的,libuv会通过类似线程池的实现来模拟不同操作系统的异步调用,这对开发者来说是不可见的。

某些异步 IO 会占用额外的线程

还是上面那个例子,我们在定时器执行的同时,去读一个文件:

const fs = require('fs')
setInterval(() => {
    console.log(new Date().getTime())
}, 3000)

fs.readFile('./index.html', () => {})

线程数量变成了 11 个,这是因为在 Node 中有一些 IO 操作(DNS,FS)和一些 CPU 密集计算(Zlib,Crypto)会启用 Node 的线程池,而线程池默认大小为 4,因为线程数变成了 11。 我们可以手动更改线程池默认大小:

process.env.UV_THREADPOOL_SIZE = 64

一行代码轻松把线程变成 71。

Libuv

Libuv 是一个跨平台的异步IO库,它结合了UNIX下的libev和Windows下的IOCP的特性,最早由Node的作者开发,专门为Node提供多平台下的异步IO支持。Libuv本身是由C++语言实现的,Node中的非苏塞IO以及事件循环的底层机制都是由libuv实现的。

libuv架构图

An in-depth analysis of processes and threads in Node.js

在Window环境下,libuv直接使用Windows的IOCP来实现异步IO。在非Windows环境下,libuv使用多线程来模拟异步IO。

注意下面我要说的话,Node的异步调用是由libuv来支持的,以上面的读取文件的例子,读文件实质的系统调用是由libuv来完成的,Node只是负责调用libuv的接口,等数据返回后再执行对应的回调方法。

Node.js 线程创建

直到 Node 10.5.0 的发布,官方才给出了一个实验性质的模块 worker_threads 给 Node 提供真正的多线程能力。

先看下简单的 demo:

const {
  isMainThread,
  parentPort,
  workerData,
  threadId,
  MessageChannel,
  MessagePort,
  Worker
} = require('worker_threads');

function mainThread() {
  for (let i = 0; i  { console.log(`main: worker stopped with exit code ${code}`); });
    worker.on('message', msg => {
      console.log(`main: receive ${msg}`);
      worker.postMessage(msg + 1);
    });
  }
}

function workerThread() {
  console.log(`worker: workerDate ${workerData}`);
  parentPort.on('message', msg => {
    console.log(`worker: receive ${msg}`);
  }),
  parentPort.postMessage(workerData);
}

if (isMainThread) {
  mainThread();
} else {
  workerThread();
}

上述代码在主线程中开启五个子线程,并且主线程向子线程发送简单的消息。

由于 worker_thread 目前仍然处于实验阶段,所以启动时需要增加 --experimental-worker flag,运行后观察活动监视器,开启了5个子线程

An in-depth analysis of processes and threads in Node.js
worker_thread 模块

worker_thread 核心代码(地址https://github.com/nodejs/node/blob/master/lib/worker_threads.js) worker_thread 模块中有 4 个对象和 2 个类,可以自己去看上面的源码。

  • isMainThread: 是否是主线程,源码中是通过 threadId === 0 进行判断的。
  • MessagePort: 用于线程之间的通信,继承自 EventEmitter。
  • MessageChannel: 用于创建异步、双向通信的通道实例。
  • threadId: 线程 ID。
  • Worker: 用于在主线程中创建子线程。第一个参数为 filename,表示子线程执行的入口。
  • parentPort: 在 worker 线程里是表示父进程的 MessagePort 类型的对象,在主线程里为 null
  • workerData: 用于在主进程中向子进程传递数据(data 副本)

总结

多进程 vs 多线程

对比一下多线程与多进程:

属性 多进程 多线程 比较
数据 数据共享复杂,需要用IPC;数据是分开的,同步简单 因为共享进程数据,数据共享简单,同步复杂 各有千秋
CPU、内存 占用内存多,切换复杂,CPU利用率低 占用内存少,切换简单,CPU利用率高 多线程更好
销毁、切换 创建销毁、切换复杂,速度慢 创建销毁、切换简单,速度很快 多线程更好
coding 编码简单、调试方便 编码、调试复杂 编码、调试复杂
可靠性 进程独立运行,不会相互影响 线程同呼吸共命运 多进程更好
分布式 可用于多机多核分布式,易于扩展 只能用于多核分布式 多进程更好

For more programming-related knowledge, please visit: Introduction to Programming! !

The above is the detailed content of An in-depth analysis of processes and threads in Node.js. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:juejin.cn. If there is any infringement, please contact admin@php.cn delete