This article will take you to understand the encoding in Node.js Buffer, I hope it will be helpful to everyone!
The smallest unit of a computer is a bit, that is, 0 and 1, which are corresponded to high and low levels on the hardware. However, only one bit represents too little information, so 8 bits are specified as one byte. After that, various information such as numbers and strings are stored based on bytes. [Recommended study: "nodejs Tutorial"] How to store
characters? It relies on encoding. Different characters correspond to different encodings. Then when rendering is needed, the font library is checked according to the corresponding encoding, and then the graphics of the corresponding characters are rendered.
Character set
The character set (charset) was originally ASCII code, which is abc ABC 123 and other 128 characters, because the computer was first invented in the United States. Later, Europe also developed a set of character set standards called ISO, and later China also developed a set of character set standards called GBK.
The International Organization for Standardization felt that we couldn’t each have one, otherwise the same code would have different meanings in different character sets, so we proposed unicode coding to include most of the world’s codes, so that each character Only unique encoding.
But ASCII code only requires 1 byte to store, while GBK requires 2 bytes, and some character sets require 3 bytes, etc. Some only need one byte to store but store 2 Bytes, which is a waste of space. So there are different encoding schemes such as utf-8, utf-16, utf-24, etc.
utf-8, utf-16, and utf-24 are all unicode encodings, but the specific implementation plans are different.
UTF-8 In order to save space, a variable-length storage scheme from 1 to 6 bytes is designed. UTF-16 is fixed at 2 bytes, and UTF-24 is fixed at 4 bytes.
Finally, UTF-8 is widely used because it takes up the least space.
Node.js Buffer encoding
Each language supports character set encoding and decoding, and Node.js does the same.
Node.js can use Buffer to store binary data, and when converting binary data to a string, you need to specify the character set. Buffer's from, byteLength, lastIndexOf and other methods support specifying encoding:
The specifically supported encodings are:
utf8, ucs2, utf16le, latin1, ascii, base64, hex
Some students may find that: base64 and hex are not character sets Ah, why are you here?
Yes, in addition to the character set, the byte-to-character encoding scheme also includes base64 for converting to plaintext characters, and hex for converting to hexadecimal.
This is why Node.js calls it encoding instead of charset, because the supported encoding and decoding schemes are not just character sets.
If encoding is not specified, the default is utf8.
const buf = Buffer.alloc(11, 'aGVsbG8gd29ybGQ=', 'base64'); console.log(buf.toString());// hello world
Encoding source code
I went through the Node.js source code about encoding:
This section implements encoding: https: //github.com/nodejs/node/blob/master/lib/buffer.js#L587-L726
You can see that each encoding implements encoding, encodingVal, byteLength, write, slice, indexOf. Several APIs, because these APIs use different encoding schemes, will have different results. Node.js will return different objects according to the incoming encoding. This is a polymorphic idea.
const encodingOps = { utf8: { encoding: 'utf8', encodingVal: encodingsMap.utf8, byteLength: byteLengthUtf8, write: (buf, string, offset, len) => buf.utf8Write(string, offset, len), slice: (buf, start, end) => buf.utf8Slice(start, end), indexOf: (buf, val, byteOffset, dir) => indexOfString(buf, val, byteOffset, encodingsMap.utf8, dir) }, ucs2: { encoding: 'ucs2', encodingVal: encodingsMap.utf16le, byteLength: (string) => string.length * 2, write: (buf, string, offset, len) => buf.ucs2Write(string, offset, len), slice: (buf, start, end) => buf.ucs2Slice(start, end), indexOf: (buf, val, byteOffset, dir) => indexOfString(buf, val, byteOffset, encodingsMap.utf16le, dir) }, utf16le: { encoding: 'utf16le', encodingVal: encodingsMap.utf16le, byteLength: (string) => string.length * 2, write: (buf, string, offset, len) => buf.ucs2Write(string, offset, len), slice: (buf, start, end) => buf.ucs2Slice(start, end), indexOf: (buf, val, byteOffset, dir) => indexOfString(buf, val, byteOffset, encodingsMap.utf16le, dir) }, latin1: { encoding: 'latin1', encodingVal: encodingsMap.latin1, byteLength: (string) => string.length, write: (buf, string, offset, len) => buf.latin1Write(string, offset, len), slice: (buf, start, end) => buf.latin1Slice(start, end), indexOf: (buf, val, byteOffset, dir) => indexOfString(buf, val, byteOffset, encodingsMap.latin1, dir) }, ascii: { encoding: 'ascii', encodingVal: encodingsMap.ascii, byteLength: (string) => string.length, write: (buf, string, offset, len) => buf.asciiWrite(string, offset, len), slice: (buf, start, end) => buf.asciiSlice(start, end), indexOf: (buf, val, byteOffset, dir) => indexOfBuffer(buf, fromStringFast(val, encodingOps.ascii), byteOffset, encodingsMap.ascii, dir) }, base64: { encoding: 'base64', encodingVal: encodingsMap.base64, byteLength: (string) => base64ByteLength(string, string.length), write: (buf, string, offset, len) => buf.base64Write(string, offset, len), slice: (buf, start, end) => buf.base64Slice(start, end), indexOf: (buf, val, byteOffset, dir) => indexOfBuffer(buf, fromStringFast(val, encodingOps.base64), byteOffset, encodingsMap.base64, dir) }, hex: { encoding: 'hex', encodingVal: encodingsMap.hex, byteLength: (string) => string.length >>> 1, write: (buf, string, offset, len) => buf.hexWrite(string, offset, len), slice: (buf, start, end) => buf.hexSlice(start, end), indexOf: (buf, val, byteOffset, dir) => indexOfBuffer(buf, fromStringFast(val, encodingOps.hex), byteOffset, encodingsMap.hex, dir) } }; function getEncodingOps(encoding) { encoding += ''; switch (encoding.length) { case 4: if (encoding === 'utf8') return encodingOps.utf8; if (encoding === 'ucs2') return encodingOps.ucs2; encoding = StringPrototypeToLowerCase(encoding); if (encoding === 'utf8') return encodingOps.utf8; if (encoding === 'ucs2') return encodingOps.ucs2; break; case 5: if (encoding === 'utf-8') return encodingOps.utf8; if (encoding === 'ascii') return encodingOps.ascii; if (encoding === 'ucs-2') return encodingOps.ucs2; encoding = StringPrototypeToLowerCase(encoding); if (encoding === 'utf-8') return encodingOps.utf8; if (encoding === 'ascii') return encodingOps.ascii; if (encoding === 'ucs-2') return encodingOps.ucs2; break; case 7: if (encoding === 'utf16le' || StringPrototypeToLowerCase(encoding) === 'utf16le') return encodingOps.utf16le; break; case 8: if (encoding === 'utf-16le' || StringPrototypeToLowerCase(encoding) === 'utf-16le') return encodingOps.utf16le; break; case 6: if (encoding === 'latin1' || encoding === 'binary') return encodingOps.latin1; if (encoding === 'base64') return encodingOps.base64; encoding = StringPrototypeToLowerCase(encoding); if (encoding === 'latin1' || encoding === 'binary') return encodingOps.latin1; if (encoding === 'base64') return encodingOps.base64; break; case 3: if (encoding === 'hex' || StringPrototypeToLowerCase(encoding) === 'hex') return encodingOps.hex; break; } }
Summary
The smallest unit for storing data in a computer is a bit, but the smallest unit for storing information is a byte. The mapping relationship based on encoding and characters is realized again. Various character sets, including ascii, iso, gbk, etc., and the International Organization for Standardization proposed unicode to include all characters. There are several unicode implementation solutions: utf-8, utf-16, utf-24, and they use different characters respectively. Number of sections to store characters. Among them, utf-8 is variable length and has the smallest storage volume, so it is widely used.
Node.js stores binary data through Buffer, and when converting it to a string, you need to specify an encoding scheme. This encoding scheme not only includes character sets (charset), but also supports hex and base64 schemes, including:
utf8, ucs2, utf16le, latin1, ascii, base64, hex
We looked at the Node.js source code of encoding and found that each encoding scheme will be used to implement a series of APIs. This is a Polymorphic thoughts.
Encoding is a concept that is frequently encountered when learning Node.js, and the encoding of Node.js does not only include charset. I hope this article can help everyone understand encoding and character sets.
For more programming-related knowledge, please visit: Introduction to Programming! !
The above is the detailed content of Let's talk about encoding in Node.js Buffer. For more information, please follow other related articles on the PHP Chinese website!

The shift from C/C to JavaScript requires adapting to dynamic typing, garbage collection and asynchronous programming. 1) C/C is a statically typed language that requires manual memory management, while JavaScript is dynamically typed and garbage collection is automatically processed. 2) C/C needs to be compiled into machine code, while JavaScript is an interpreted language. 3) JavaScript introduces concepts such as closures, prototype chains and Promise, which enhances flexibility and asynchronous programming capabilities.

Different JavaScript engines have different effects when parsing and executing JavaScript code, because the implementation principles and optimization strategies of each engine differ. 1. Lexical analysis: convert source code into lexical unit. 2. Grammar analysis: Generate an abstract syntax tree. 3. Optimization and compilation: Generate machine code through the JIT compiler. 4. Execute: Run the machine code. V8 engine optimizes through instant compilation and hidden class, SpiderMonkey uses a type inference system, resulting in different performance performance on the same code.

JavaScript's applications in the real world include server-side programming, mobile application development and Internet of Things control: 1. Server-side programming is realized through Node.js, suitable for high concurrent request processing. 2. Mobile application development is carried out through ReactNative and supports cross-platform deployment. 3. Used for IoT device control through Johnny-Five library, suitable for hardware interaction.

I built a functional multi-tenant SaaS application (an EdTech app) with your everyday tech tool and you can do the same. First, what’s a multi-tenant SaaS application? Multi-tenant SaaS applications let you serve multiple customers from a sing

This article demonstrates frontend integration with a backend secured by Permit, building a functional EdTech SaaS application using Next.js. The frontend fetches user permissions to control UI visibility and ensures API requests adhere to role-base

JavaScript is the core language of modern web development and is widely used for its diversity and flexibility. 1) Front-end development: build dynamic web pages and single-page applications through DOM operations and modern frameworks (such as React, Vue.js, Angular). 2) Server-side development: Node.js uses a non-blocking I/O model to handle high concurrency and real-time applications. 3) Mobile and desktop application development: cross-platform development is realized through ReactNative and Electron to improve development efficiency.

The latest trends in JavaScript include the rise of TypeScript, the popularity of modern frameworks and libraries, and the application of WebAssembly. Future prospects cover more powerful type systems, the development of server-side JavaScript, the expansion of artificial intelligence and machine learning, and the potential of IoT and edge computing.

JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version
Useful JavaScript development tools

Notepad++7.3.1
Easy-to-use and free code editor

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.