Home  >  Article  >  Web Front-end  >  This article will give you an in-depth understanding of the Buffer class in Node

This article will give you an in-depth understanding of the Buffer class in Node

青灯夜游
青灯夜游forward
2022-12-12 19:36:131693browse

This article will give you an in-depth understanding of the Buffer class in Node. I hope it will be helpful to everyone!

This article will give you an in-depth understanding of the Buffer class in Node

Before TypedArray came out, the JavaScript language could not handle raw binary data(raw binary data) well Yes, this is because JavaScript was mainly used as a scripting language in browsers at the beginning, so there were very few scenarios where native binary data needed to be processed. After Node came out, because the server-side application needs to process a large number of binary streams such as file reading and writing, TCP connection, etc., Node is among JavaScript (V8) In addition, a new data type Buffer is defined. Since Buffer is widely used in Node applications, only by truly mastering its usage can you write better Node applications. [Related tutorial recommendations: nodejs video tutorial, Programming teaching]

Binary Basics


In the formal introduction Before the specific usage of Buffer, let’s briefly review the knowledge about binary.

As programmers, we should all be familiar with binary, because all the underlying data of the computer is stored in binary (binary) format. In other words, the files in your computer, whether they are plain text, pictures or videos, are composed of the two numbers 01 on the computer's hard drive. In computer science, we call a single number 0 or 1 a bit (bit), and 8 bits can form a byte(byte). If the decimal number 16 is represented by 1 byte, the underlying storage structure is: 截屏2022-10-15 下午2.23.13.png We can see that if 16 is represented in binary, there are 6 more digits than in decimal. If the number is If it is larger, there will be more binary digits, which will be very inconvenient for us to read and write. For this reason, programmers generally like to use hexadecimal (hexadecimal) to represent data instead of using binary directly. For example, when we write CSS, we use the value of color Hexadecimal (e.g. #FFFFFF) instead of a bunch of 0s and 1s.

Character Encoding

Since the bottom layer of all data is binary and the data transmitted over the network is also binary, why is the article we are reading now# What about ##中文 instead of a bunch of 0 and 1? Here we will introduce the concept of character encoding. The so-called Character encoding is simply a mapping relationship table, which represents how characters (Chinese characters, English characters or other characters) are compared with binary numbers (contains several bytes) corresponding to each other. For example, if we use the familiar ascii to encode, the binary representation of the English character a is 0b01100001 (0b is a binary number prefix). Therefore, when our computer reads the string of binary data 0b01100001 from a ascii-encoded file, the character a will be displayed on the screen. , similarly the character a is the binary data of 0b01100001 when saved in the computer or transmitted on the network. In addition to the ascii code, common character encodings include utf-8 and utf-16, etc.

Buffer


After mastering the basic

binary knowledge and character encoding concepts, we can finally Officially learning Buffer. Let’s take a look at the official definition of Buffer:

The

Buffer class in Node.js is designed to handle raw binary data. Each buffer corresponds to some raw memory allocated outside V8. Buffers act somewhat like arrays of integers, but aren't resizable and have a whole bunch of methods specifically for binary data. The integers in a buffer each represent a byte and so are limited to values from 0 to 255 inclusive. When using console.log() to print the Buffer instance, you'll get a chain of values ​​in hexadecimal values.

Simply put, the so-called Buffer is a fixed size memory space allocated by Node outside the V8 heap memory. When Buffer is printed out using console.log, a string of hex## will be printed in units of bytes # represents the value.

Create Buffer

After understanding the basic concepts of

Buffer, let’s create a BufferObject. There are many ways to create Buffer, the common ones are Buffer.alloc, Buffer.allocUnsafe and Buffer.from.

Buffer.alloc(size[, fill[, encoding]])

This is the most common way to create a Buffer. You only need to pass in the size of the Buffer

const buff = Buffer.alloc(5)

console.log(buff)
// Prints: 

In the above code, I created a Buffer area with a size of

5 bytes. The console.log function will print out five consecutive hexadecimal numbers, indicating The content currently stored in the Buffer. We can see that the current Buffer is filled 0, which is the default behavior of Node. We can set the next two parameters fill and encoding to specify Fill in additional content during initialization.

It is worth mentioning here that I used the Node global

Buffer object in the above code without explicitly importing it from the node:buffer package. , this is entirely because of the convenience of writing. In actual development, the latter should be used:

import { Buffer } from 'node:buffer'

Buffer.allocUnsafe(size)

The biggest difference between

Buffer.allocUnsafe and Buffer.alloc is that the memory space applied for using the allocUnsafe function is not initialized, and That is to say, the data used last time may still remain, so there will be data security issues. allocUnsafe The function receives a size parameter as the size of the buffer area:

const buff = Buffer.allocUnsafe(5)

console.log(buff)
// Prints (实际内容可能有出入): 

Judging from the above output results, we cannot control the use of

Buffer.allocUnsafeThe allocated buffer content. It is precisely because the allocated memory is not initialized that this function allocates Buffer faster than Buffer.alloc. In actual development, we should make a choice based on our actual needs.

Buffer.from

This function is our

most commonly used function to create Buffer, it has many different overloads, that is to say, different parameters passed in will have different behaviors. Let’s look at a few common overloads:

Buffer.from(string[, encoding])

When the first parameter we pass in is

String type, Buffer.from will generate the binary representation corresponding to the string based on the encoding of the string (encoding parameter, the default is utf8). Take an example:

const buff = Buffer.from('你好世界')

console.log(buff)
// Prints: 
console.log(buff.toString())
// Prints: '你好世界'
console.log(buff.toString('ascii'))
// Prints: ''d= e%=d8\x16g\x15\f''

In the above example, I used the string "Hello World" to complete the initialization of the Buffer. Since I did not pass in the second

encoding parameter, So the default encoding is utf8. Later, by looking at the output of the first console.log, we can find that although the string we passed in only has four characters, the initialized Buffer has 12 bytes. This is because a Chinese character in utf8 encoding will use 3 bytes to represent. Then we use the buff.toString() method to view the content of the buff. Since the default encoding output format of the toString method is utf8, we can see the second console .log can correctly output the contents of buff storage. However, in the third console.log, we specify that the character encoding type is ascii. At this time, we will see a bunch of garbled characters. Seeing this, I think you must have a deeper understanding of the Character Encoding I mentioned before.

Buffer.from(buffer)

When the parameter received by Buffer.from is a buffer object, Node will create a new Buffer instance and then pass it in The buffer contents

copy to the new Buffer object.

const buf1 = Buffer.from('buffer')
const buf2 = Buffer.from(buf1)

console.log(buf1)
// Prints: 
console.log(buf2)
// Prints: 

buf1[0] = 0x61

console.log(buf1.toString())
// Prints: auffer
console.log(buf2.toString())
// Prints: buffer

In the above example, we first created a Buffer object

buf1, the content stored in it is the string "buffer", and then initialized a new one through this Buffer object Buffer object buf2. At this time, we changed the first byte of buf1 to 0x61 (the encoding of a), and we found that the output of buf1 became auffer, and # The content of ##buf2 has not changed, which confirms the view that Buffer.from(buffer) is a data copy.

?注意:当Buffer的数据很大的时候,Buffer.from拷贝数据的性能是很差的,会造成CPU占用飙升,主线程卡死的情况,所以在使用这个函数的时候一定要清楚地知道Buffer.from(buffer)背后都做了什么。笔者就在实际项目开发中踩过这个坑,导致线上服务响应缓慢!

Buffer.from(arrayBuffer[, byteOffset[, length]])

说完了buffer参数,我们再来说一下arrayBuffer参数,它的表现和buffer是有很大的区别的。ArrayBuffer是ECMAScript定义的一种数据类型,它简单来说就是一片你不可以直接(或者不方便)使用的内存,你必须通过一些诸如Uint16ArrayTypedArray对象作为View来使用这片内存,例如一个Uint16Array对象的.buffer属性就是一个ArrayBuffer对象。当Buffer.from函数接收一个ArrayBuffer作为参数时,Node会创建一个新的Buffer对象,不过这个Buffer对象指向的内容还是原来ArrayBuffer的内容,没有任何的数据拷贝行为。我们来看个例子:

const arr = new Uint16Array(2)

arr[0] = 5000
arr[1] = 4000

const buf = Buffer.from(arr.buffer)

console.log(buf)
// Prints: 

// 改变原来数组的数字
arr[1] = 6000

console.log(buf)
// Prints: 

从上面例子的输出我们可以知道,arrbuf对象会共用同一片内存空间,所以当我们改变原数组的数据时,buf的数据也会发生相应的变化。

其它Buffer操作

看完了创建Buffer的几种做法,我们接着来看一下Buffer其它的一些常用API或者属性

buf.length

这个函数会返回当前buffer占用了多少字节

// 创建一个大小为1234字节的Buffer对象
const buf1 = Buffer.alloc(1234)
console.log(buf1.length)
// Prints: 1234

const buf2 = Buffer.from('Hello')
console.log(buf2.length)
// Prints: 5

Buffer.poolSize

这个字段表示Node会为我们预创建的Buffer池子有多大,它的默认值是8192,也就是8KB。Node在启动的时候,它会为我们预创建一个8KB大小的内存池,当用户用某些API(例如Buffer.alloc)创建Buffer实例的时候可能会用到这个预创建的内存池以提高效率,下面是一个具体的例子:

const buf1 = Buffer.from('Hello')
console.log(buf1.length)
// Prints: 5

// buf1的buffer属性会指向其底层的ArrayBuffer对象对应的内存
console.log(buf1.buffer.byteLength)
// Prints: 8192

const buf2 = Buffer.from('World')
console.log(buf2.length)
// Prints: 5

// buf2的buffer属性会指向其底层的ArrayBuffer对象对应的内存
console.log(buf2.buffer.byteLength)
// Prints: 8192

在上面的例子中,buf1buf2对象由于长度都比较小所以会直接使用预创建的8KB内存池。其在内存的大概表示如图:截屏2022-12-11 下午1.51.54.png这里值得一提的是只有当需要分配的内存区域小于4KB(8KB的一半)并且现有的Buffer池子还够用的时候,新建的Buffer才会直接使用当前的池子,否则Node会新建一个新的8KB的池子或者直接在内存里面分配一个区域(FastBuffer)。

buf.write(string[, offset,[, length]][, encoding])

这个函数可以按照一定的偏移量(offset)往一个Buffer实例里面写入一定长度(length)的数据。我们来看一下具体的例子:

const buf = Buffer.from('Hello')

console.log(buf.toString())
// Prints: "Hello"

// 从第3个位置开始写入'LLO'字符
buf.write('LLO', 2)
console.log("HeLLO")
// Prints: "HeLLO"

这里需要注意的是当我们需要写入的字符串的长度超过buffer所能容纳的最长字符长度(buf.length)时,超过长度的字符会被丢弃:

const buf = Buffer.from('Hello')

buf.write('LLO!', 2)
console.log(buf.toString())
// Print:s "HeLLO"

另外,当我们写入的字符长度超过buffer的最长长度,并且最后一个可以写入的字符不能全部填满时,最后一个字符整个不写入:

const buf = Buffer.from('Hello')

buf.write('LL你', 2)
console.log(buf.toString())
// Prints "HeLLo"

在上面的例子中,由于"你"是中文字符,需要占用三个字节,所以不能全部塞进buf里面,因此整个字符的三个字节都被丢弃了,buf对象的最后一个字节还是保持"o"不变。

Buffer.concat(list[, totalLength])

这个函数可以用来拼接多个Buffer对象生成一个新的buffer。函数的第一个参数是待拼接的Buffer数组,第二个参数表示拼接完的buffer的长度是多少(totalLength)。下面是一个简单的例子:

const buf1 = Buffer.from('Hello')
const buf2 = Buffer.from('World')

const buf = Buffer.concat([buf1, buf2])
console.log(buf.toString())
// Prints "HelloWorld"

上面的例子中,因为我们没有指定最终生成Buffer对象的长度,所以Node会计算出一个默认值,那就是buf.totalLength = buf1.length + buf2.length。而如果我们指定了totalLength的值的话,当这个值比buf1.lengh + buf2.length小时,Node会截断最后生成的buffer;如果指定的值比buf1.length + buf2.length大时,生成buf对象的长度还是totalLength,多出来的位数填充的内容是0。

这里还有一点值得指出的是,Buffer.concat最后拼接出来的Buffer对象是通过拷贝原来Buffer对象得出来,所以改变原来的Buffer对象的内容不会影响到生成的Buffer对象,不过这里我们还是需要考虑拷贝的性能问题就是了。

Garbage collection of Buffer objects

At the beginning of the article, I said that the memory areas allocated by all Buffer objects in Node are independent of V8 Heap space belongs to off-heap memory. So does this mean that the Buffer object is not affected by the V8 garbage collection mechanism and we need to manually manage the memory? Actually no, every time we use Node's API to create a new Buffer object, each Buffer object corresponds to an object (a reference to the Buffer memory) in the JavaScript space. This object is controlled by V8 garbage collection. And Node only needs to hang some hooks to release the off-heap memory pointed to by the Buffer when this reference is garbage collected. Simply speaking, we don’t need to worry about the space allocated by Buffer. V8’s garbage collection mechanism will help us reclaim useless memory.

Summary

In this article I have introduced you to some basic knowledge of Buffer, including Buffer’s common APIs and properties. I hope this knowledge can be helpful to your work. help.

For more node-related knowledge, please visit: nodejs tutorial!

The above is the detailed content of This article will give you an in-depth understanding of the Buffer class in Node. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:juejin.cn. If there is any infringement, please contact admin@php.cn delete