Home > Article > Web Front-end > Briefly understand the Buffer module in Node.js
This article will take you to understand the Buffer in Node.js, and take a look at the Buffer structure, Buffer memory allocation, Buffer splicing, etc. I hope it will be helpful to everyone!
JavaScript
Very friendly for string operations
Buffer
is an object like Array
, mainly used to operate bytes.
Buffer
is a typical module combining JavaScript and C. The performance-related parts are implemented in C, and the non-performance-related parts are implemented in C. Partially implemented in JavaScript.
#The memory occupied by Buffer is not allocated through V8 and belongs to off-heap memory. Due to the performance impact of V8 garbage collection, it is a good idea to manage commonly used operation objects with more efficient and proprietary memory allocation and recycling policies.
Buffer is already valued when the Node process starts and is placed on the global object (global). Therefore, there is no need to introduce require when using buffer.
The elements of the Buffer object are not two-digit hexadecimal digits, that is, a value of 0-255
let buf01 = Buffer.alloc(8); console.log(buf01); // <Buffer 00 00 00 00 00 00 00 00>
You can use fill
to fill the value of buf (default is utf-8
encoding). If the filled value exceeds the buffer, it will not be written.
If the buffer length is greater than the content, it will be filled repeatedly
If you want to clear the previously filled content, you can directly fill()
buf01.fill('12345678910') console.log(buf01); // <Buffer 31 32 33 34 35 36 37 38> console.log(buf01.toString()); // 12345678
If the filled-in content is Chinese, under the influence of utf-8
, Chinese characters will occupy 3 elements, and letters and half-width punctuation marks will occupy 1 element.
let buf02 = Buffer.alloc(18, '开始我们的新路程', 'utf-8'); console.log(buf02.toString()); // 开始我们的新
Buffer
is greatly affected by the Array type
. You can access the length attribute to get the length, you can also access the element through the subscript, and you can also view the element position through indexOf.
console.log(buf02); // <Buffer e5 bc 80 e5 a7 8b e6 88 91 e4 bb ac e7 9a 84 e6 96 b0> console.log(buf02.length) // 18字节 console.log(buf02[6]) // 230: e6 转换后就是 230 console.log(buf02.indexOf('我')) // 6:在第7个字节位置 console.log(buf02.slice(6, 9).toString()) // 我: 取得<Buffer e6 88 91>,转换后就是'我'
If the assigned value to the byte is not an integer between 0255, or the assigned value is a decimal, the assigned value is less than 0, add 256 to the value one by one until you get a value between 0255 integer. If it is greater than 255, subtract 255 one by one. If it is a decimal, discard the decimal part (no rounding)
Buffer
The memory allocation of the object is not in the V8 heap In memory, memory application is implemented at the C level of Node. Because when processing a large amount of byte data, you cannot apply for some memory from the operating system when you need some memory. For this reason, Node uses memory at the C level to allocate memory in JavaScript
Node
adopts the slab allocation mechanism
, slab
is a dynamic memory management mechanism, currently widely used in some *nix
operating systems, such as Linux
slab
is a fixed-size memory area that has been applied for. The slab has the following three states:
Node uses 8KB as the limit to distinguish whether the Buffer is a large object or a small object
console.log(Buffer.poolSize); // 8192
This 8KB value is the size of each slab. At the JavaScript level, it is used as the unit unit for memory allocation
If the specified Buffer
size is less than 8KB, Node will allocate it according to the small object method
buffer
object of 1024KB, the current slab
will be occupied by 1024KB, and the record is from Where does this slab
start to be used?buffer
object with the size is 3072KB. The construction process will determine whether the remaining space of the current slab
is enough. If it is enough, use the remaining space and update the allocation status of slab
. After 3072KB space is used, the remaining space of this slab is currently 4096KB. buffer
with a size of 6144KB at this time, the current slab space is insufficient and a new slab will be constructed.
(This will cause the remaining space of the original slab to be wasted) For example, in the following example:
Buffer.alloc(1) Buffer.alloc(8192)
第一个slab
中只会存在1字节的buffer对象,而后一个buffer对象会构建一个新的slab存放
由于一个slab可能分配给多个Buffer对象使用,只有这些小buffer对象在作用域释放并都可以回收时,slab的空间才会被回收。 尽管只创建1字节的buffer对象,但是如果不释放,实际是8KB的内存都没有释放
小结:
真正的内存是在Node的C++层面提供,JavaScript层面只是使用。当进行小而频繁的Buffer操作时,采用slab的机制进行预先申请和时候分配,使得JavaScript到操作系统之间不必有过多的内存申请方面的系统调用。 对于大块的buffer,直接使用C++层面提供的内存即可,无需细腻的分配操作。
buffer在使用场景中,通常是以一段段的方式进行传输。
const fs = require('fs'); let rs = fs.createReadStream('./静夜思.txt', { flags:'r'}); let str = '' rs.on('data', (chunk)=>{ str += chunk; }) rs.on('end', ()=>{ console.log(str); })
以上是读取流的范例,data时间中获取到的chunk对象就是buffer对象。
但是当输入流中有宽字节编码(一个字占多个字节
)时,问题就会暴露。在str += chunk
中隐藏了toString()
操作。等价于str = str.toString() + chunk.toString()
。
下面将可读流的每次读取buffer长度限制为11.
fs.createReadStream('./静夜思.txt', { flags:'r', highWaterMark: 11});
输出得到:
上面出现了乱码,上面限制了buffer长度为11,对于任意长度的buffer而言,宽字节字符串都有可能存在被截断的情况,只不过buffer越长出现概率越低。
但是如果设置了encoding
为utf-8
,就不会出现此问题了。
fs.createReadStream('./静夜思.txt', { flags:'r', highWaterMark: 11, encoding:'utf-8'});
原因: 虽然无论怎么设置编码,流的触发次数都是一样,但是在调用setEncoding
时,可读流对象在内部设置了一个decoder对象
。每次data事件都会通过decoder对象
进行buffer到字符串的解码,然后传递给调用者。
string_decoder
模块提供了用于将 Buffer 对象解码为字符串(以保留编码的多字节 UTF-8 和 UTF-16 字符的方式)的 API
const { StringDecoder } = require('string_decoder'); let s1 = Buffer.from([0xe7, 0xaa, 0x97, 0xe5, 0x89, 0x8d, 0xe6, 0x98, 0x8e, 0xe6, 0x9c]) let s2 = Buffer.from([0x88, 0xe5, 0x85, 0x89, 0xef, 0xbc, 0x8c, 0x0d, 0x0a, 0xe7, 0x96]) console.log(s1.toString()); console.log(s2.toString()); console.log('------------------'); const decoder = new StringDecoder('utf8'); console.log(decoder.write(s1)); console.log(decoder.write(s2));
StringDecoder
在得到编码之后,知道了宽字节字符串在utf-8
编码下是以3个字节的方式存储的,所以第一次decoder.write
只会输出前9个字节转码的字符,后两个字节会被保留在StringDecoder
内部。
buffer在文件I/O和网络I/O中运用广泛,尤其在网络传输中,性能举足轻重。在应用中,通常会操作字符串,但是一旦在网络中传输,都需要转换成buffer,以进行二进制数据传输。 在web应用中,字符串转换到buffer是时时刻刻发生的,提高字符串到buffer的转换效率,可以很大程度地提高网络吞吐率。
如果通过纯字符串的方式向客户端发送,性能会比发送buffer对象更差,因为buffer对象无须在每次响应时进行转换。通过预先转换静态内容为buffer对象,可以有效地减少CPU重复使用,节省服务器资源。
可以选择将页面中动态和静态内容分离,静态内容部分预先转换为buffer的方式,使得性能得到提升。
在文件的读取时,highWaterMark
设置对性能影响至关重要。在理想状态下,每次读取的长度就是用户指定的highWaterMark
。
highWaterMark
大小对性能有两个影响的点:
更多node相关知识,请访问:nodejs 教程!!
The above is the detailed content of Briefly understand the Buffer module in Node.js. For more information, please follow other related articles on the PHP Chinese website!