search
HomeWeb Front-endJS TutorialLet's talk about encoding in Node.js Buffer

Let's talk about encoding in Node.js Buffer

Aug 31, 2021 am 10:28 AM
bufferencodingnode.js

This article will take you to understand the encoding in Node.js Buffer, I hope it will be helpful to everyone!

Let's talk about encoding in Node.js Buffer

The smallest unit of a computer is a bit, that is, 0 and 1, which are corresponded to high and low levels on the hardware. However, only one bit represents too little information, so 8 bits are specified as one byte. After that, various information such as numbers and strings are stored based on bytes. [Recommended study: "nodejs Tutorial"] How to store

characters? It relies on encoding. Different characters correspond to different encodings. Then when rendering is needed, the font library is checked according to the corresponding encoding, and then the graphics of the corresponding characters are rendered.

Character set

The character set (charset) was originally ASCII code, which is abc ABC 123 and other 128 characters, because the computer was first invented in the United States. Later, Europe also developed a set of character set standards called ISO, and later China also developed a set of character set standards called GBK.

The International Organization for Standardization felt that we couldn’t each have one, otherwise the same code would have different meanings in different character sets, so we proposed unicode coding to include most of the world’s codes, so that each character Only unique encoding.

But ASCII code only requires 1 byte to store, while GBK requires 2 bytes, and some character sets require 3 bytes, etc. Some only need one byte to store but store 2 Bytes, which is a waste of space. So there are different encoding schemes such as utf-8, utf-16, utf-24, etc.

utf-8, utf-16, and utf-24 are all unicode encodings, but the specific implementation plans are different.

UTF-8 In order to save space, a variable-length storage scheme from 1 to 6 bytes is designed. UTF-16 is fixed at 2 bytes, and UTF-24 is fixed at 4 bytes.

Lets talk about encoding in Node.js Buffer

Finally, UTF-8 is widely used because it takes up the least space.

Node.js Buffer encoding

Each language supports character set encoding and decoding, and Node.js does the same.

Node.js can use Buffer to store binary data, and when converting binary data to a string, you need to specify the character set. Buffer's from, byteLength, lastIndexOf and other methods support specifying encoding:

The specifically supported encodings are:

utf8, ucs2, utf16le, latin1, ascii, base64, hex

Some students may find that: base64 and hex are not character sets Ah, why are you here?

Yes, in addition to the character set, the byte-to-character encoding scheme also includes base64 for converting to plaintext characters, and hex for converting to hexadecimal.

This is why Node.js calls it encoding instead of charset, because the supported encoding and decoding schemes are not just character sets.

If encoding is not specified, the default is utf8.

const buf = Buffer.alloc(11, 'aGVsbG8gd29ybGQ=', 'base64');

console.log(buf.toString());// hello world

Encoding source code

I went through the Node.js source code about encoding:

This section implements encoding: https: //github.com/nodejs/node/blob/master/lib/buffer.js#L587-L726

You can see that each encoding implements encoding, encodingVal, byteLength, write, slice, indexOf. Several APIs, because these APIs use different encoding schemes, will have different results. Node.js will return different objects according to the incoming encoding. This is a polymorphic idea.

const encodingOps = {
  utf8: {
    encoding: 'utf8',
    encodingVal: encodingsMap.utf8,
    byteLength: byteLengthUtf8,
    write: (buf, string, offset, len) => buf.utf8Write(string, offset, len),
    slice: (buf, start, end) => buf.utf8Slice(start, end),
    indexOf: (buf, val, byteOffset, dir) =>
      indexOfString(buf, val, byteOffset, encodingsMap.utf8, dir)
  },
  ucs2: {
    encoding: 'ucs2',
    encodingVal: encodingsMap.utf16le,
    byteLength: (string) => string.length * 2,
    write: (buf, string, offset, len) => buf.ucs2Write(string, offset, len),
    slice: (buf, start, end) => buf.ucs2Slice(start, end),
    indexOf: (buf, val, byteOffset, dir) =>
      indexOfString(buf, val, byteOffset, encodingsMap.utf16le, dir)
  },
  utf16le: {
    encoding: 'utf16le',
    encodingVal: encodingsMap.utf16le,
    byteLength: (string) => string.length * 2,
    write: (buf, string, offset, len) => buf.ucs2Write(string, offset, len),
    slice: (buf, start, end) => buf.ucs2Slice(start, end),
    indexOf: (buf, val, byteOffset, dir) =>
      indexOfString(buf, val, byteOffset, encodingsMap.utf16le, dir)
  },
  latin1: {
    encoding: 'latin1',
    encodingVal: encodingsMap.latin1,
    byteLength: (string) => string.length,
    write: (buf, string, offset, len) => buf.latin1Write(string, offset, len),
    slice: (buf, start, end) => buf.latin1Slice(start, end),
    indexOf: (buf, val, byteOffset, dir) =>
      indexOfString(buf, val, byteOffset, encodingsMap.latin1, dir)
  },
  ascii: {
    encoding: 'ascii',
    encodingVal: encodingsMap.ascii,
    byteLength: (string) => string.length,
    write: (buf, string, offset, len) => buf.asciiWrite(string, offset, len),
    slice: (buf, start, end) => buf.asciiSlice(start, end),
    indexOf: (buf, val, byteOffset, dir) =>
      indexOfBuffer(buf,
                    fromStringFast(val, encodingOps.ascii),
                    byteOffset,
                    encodingsMap.ascii,
                    dir)
  },
  base64: {
    encoding: 'base64',
    encodingVal: encodingsMap.base64,
    byteLength: (string) => base64ByteLength(string, string.length),
    write: (buf, string, offset, len) => buf.base64Write(string, offset, len),
    slice: (buf, start, end) => buf.base64Slice(start, end),
    indexOf: (buf, val, byteOffset, dir) =>
      indexOfBuffer(buf,
                    fromStringFast(val, encodingOps.base64),
                    byteOffset,
                    encodingsMap.base64,
                    dir)
  },
  hex: {
    encoding: 'hex',
    encodingVal: encodingsMap.hex,
    byteLength: (string) => string.length >>> 1,
    write: (buf, string, offset, len) => buf.hexWrite(string, offset, len),
    slice: (buf, start, end) => buf.hexSlice(start, end),
    indexOf: (buf, val, byteOffset, dir) =>
      indexOfBuffer(buf,
                    fromStringFast(val, encodingOps.hex),
                    byteOffset,
                    encodingsMap.hex,
                    dir)
  }
};
function getEncodingOps(encoding) {
  encoding += '';
  switch (encoding.length) {
    case 4:
      if (encoding === 'utf8') return encodingOps.utf8;
      if (encoding === 'ucs2') return encodingOps.ucs2;
      encoding = StringPrototypeToLowerCase(encoding);
      if (encoding === 'utf8') return encodingOps.utf8;
      if (encoding === 'ucs2') return encodingOps.ucs2;
      break;
    case 5:
      if (encoding === 'utf-8') return encodingOps.utf8;
      if (encoding === 'ascii') return encodingOps.ascii;
      if (encoding === 'ucs-2') return encodingOps.ucs2;
      encoding = StringPrototypeToLowerCase(encoding);
      if (encoding === 'utf-8') return encodingOps.utf8;
      if (encoding === 'ascii') return encodingOps.ascii;
      if (encoding === 'ucs-2') return encodingOps.ucs2;
      break;
    case 7:
      if (encoding === 'utf16le' ||
          StringPrototypeToLowerCase(encoding) === 'utf16le')
        return encodingOps.utf16le;
      break;
    case 8:
      if (encoding === 'utf-16le' ||
          StringPrototypeToLowerCase(encoding) === 'utf-16le')
        return encodingOps.utf16le;
      break;
    case 6:
      if (encoding === 'latin1' || encoding === 'binary')
        return encodingOps.latin1;
      if (encoding === 'base64') return encodingOps.base64;
      encoding = StringPrototypeToLowerCase(encoding);
      if (encoding === 'latin1' || encoding === 'binary')
        return encodingOps.latin1;
      if (encoding === 'base64') return encodingOps.base64;
      break;
    case 3:
      if (encoding === 'hex' || StringPrototypeToLowerCase(encoding) === 'hex')
        return encodingOps.hex;
      break;
  }
}

Summary

The smallest unit for storing data in a computer is a bit, but the smallest unit for storing information is a byte. The mapping relationship based on encoding and characters is realized again. Various character sets, including ascii, iso, gbk, etc., and the International Organization for Standardization proposed unicode to include all characters. There are several unicode implementation solutions: utf-8, utf-16, utf-24, and they use different characters respectively. Number of sections to store characters. Among them, utf-8 is variable length and has the smallest storage volume, so it is widely used.

Node.js stores binary data through Buffer, and when converting it to a string, you need to specify an encoding scheme. This encoding scheme not only includes character sets (charset), but also supports hex and base64 schemes, including:

utf8, ucs2, utf16le, latin1, ascii, base64, hex

We looked at the Node.js source code of encoding and found that each encoding scheme will be used to implement a series of APIs. This is a Polymorphic thoughts.

Encoding is a concept that is frequently encountered when learning Node.js, and the encoding of Node.js does not only include charset. I hope this article can help everyone understand encoding and character sets.

For more programming-related knowledge, please visit: Introduction to Programming! !

The above is the detailed content of Let's talk about encoding in Node.js Buffer. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:掘金社区. If there is any infringement, please contact admin@php.cn delete
Python and JavaScript: Understanding the Strengths of EachPython and JavaScript: Understanding the Strengths of EachMay 06, 2025 am 12:15 AM

Python and JavaScript each have their own advantages, and the choice depends on project needs and personal preferences. 1. Python is easy to learn, with concise syntax, suitable for data science and back-end development, but has a slow execution speed. 2. JavaScript is everywhere in front-end development and has strong asynchronous programming capabilities. Node.js makes it suitable for full-stack development, but the syntax may be complex and error-prone.

JavaScript's Core: Is It Built on C or C  ?JavaScript's Core: Is It Built on C or C ?May 05, 2025 am 12:07 AM

JavaScriptisnotbuiltonCorC ;it'saninterpretedlanguagethatrunsonenginesoftenwritteninC .1)JavaScriptwasdesignedasalightweight,interpretedlanguageforwebbrowsers.2)EnginesevolvedfromsimpleinterpreterstoJITcompilers,typicallyinC ,improvingperformance.

JavaScript Applications: From Front-End to Back-EndJavaScript Applications: From Front-End to Back-EndMay 04, 2025 am 12:12 AM

JavaScript can be used for front-end and back-end development. The front-end enhances the user experience through DOM operations, and the back-end handles server tasks through Node.js. 1. Front-end example: Change the content of the web page text. 2. Backend example: Create a Node.js server.

Python vs. JavaScript: Which Language Should You Learn?Python vs. JavaScript: Which Language Should You Learn?May 03, 2025 am 12:10 AM

Choosing Python or JavaScript should be based on career development, learning curve and ecosystem: 1) Career development: Python is suitable for data science and back-end development, while JavaScript is suitable for front-end and full-stack development. 2) Learning curve: Python syntax is concise and suitable for beginners; JavaScript syntax is flexible. 3) Ecosystem: Python has rich scientific computing libraries, and JavaScript has a powerful front-end framework.

JavaScript Frameworks: Powering Modern Web DevelopmentJavaScript Frameworks: Powering Modern Web DevelopmentMay 02, 2025 am 12:04 AM

The power of the JavaScript framework lies in simplifying development, improving user experience and application performance. When choosing a framework, consider: 1. Project size and complexity, 2. Team experience, 3. Ecosystem and community support.

The Relationship Between JavaScript, C  , and BrowsersThe Relationship Between JavaScript, C , and BrowsersMay 01, 2025 am 12:06 AM

Introduction I know you may find it strange, what exactly does JavaScript, C and browser have to do? They seem to be unrelated, but in fact, they play a very important role in modern web development. Today we will discuss the close connection between these three. Through this article, you will learn how JavaScript runs in the browser, the role of C in the browser engine, and how they work together to drive rendering and interaction of web pages. We all know the relationship between JavaScript and browser. JavaScript is the core language of front-end development. It runs directly in the browser, making web pages vivid and interesting. Have you ever wondered why JavaScr

Node.js Streams with TypeScriptNode.js Streams with TypeScriptApr 30, 2025 am 08:22 AM

Node.js excels at efficient I/O, largely thanks to streams. Streams process data incrementally, avoiding memory overload—ideal for large files, network tasks, and real-time applications. Combining streams with TypeScript's type safety creates a powe

Python vs. JavaScript: Performance and Efficiency ConsiderationsPython vs. JavaScript: Performance and Efficiency ConsiderationsApr 30, 2025 am 12:08 AM

The differences in performance and efficiency between Python and JavaScript are mainly reflected in: 1) As an interpreted language, Python runs slowly but has high development efficiency and is suitable for rapid prototype development; 2) JavaScript is limited to single thread in the browser, but multi-threading and asynchronous I/O can be used to improve performance in Node.js, and both have advantages in actual projects.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment