Home  >  Article  >  Web Front-end  >  How to Decode Base64 Strings in JavaScript While Handling UTF-8 Encoding?

How to Decode Base64 Strings in JavaScript While Handling UTF-8 Encoding?

Barbara Streisand
Barbara StreisandOriginal
2024-11-01 13:10:02453browse

How to Decode Base64 Strings in JavaScript While Handling UTF-8 Encoding?

Decode Base64 Using JavaScript atob Function: Handling UTF-8

JavaScript's atob() function is designed to decode base64-encoded strings. Users may encounter issues when decoding UTF-8 encoded strings, resulting in ASCII-encoded characters instead of proper UTF-8 representation.

Challenge: Understanding the Unicode Problem

Base64 expects binary data as input, and JavaScript considers strings with characters occupying one byte as binary data. Characters occupying more than one byte in UTF-8 encoded strings, however, trigger exceptions during encoding.

Solution 1: Binary Interoperability

The recommended fix is to encode to and decode binary strings:

Encoding UTF-8 to Binary

function toBinary(string) {
  const codeUnits = new Uint16Array(string.length);
  for (let i = 0; i < codeUnits.length; i++) {
    codeUnits[i] = string.charCodeAt(i);
  }
  return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}

Decoding Binary to UTF-8

function fromBinary(encoded) {
  const binary = atob(encoded);
  const bytes = new Uint8Array(binary.length);
  for (let i = 0; i < bytes.length; i++) {
    bytes[i] = binary.charCodeAt(i);
  }
  return String.fromCharCode(...new Uint16Array(bytes.buffer));
}

This solution converts the original UTF-8 string to a binary representation, preserving UTF-16 encoding, a native representation in JavaScript.

Solution 2: ASCII Base64 Interoperability

An alternative solution focused on UTF-8 interoperability is to maintain plaintext base64 strings:

Encoding UTF-8 to Base64

function b64EncodeUnicode(str) {    
  return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
    function toSolidBytes(match, p1) {
      return String.fromCharCode('0x' + p1);
  }));
}

Decoding Base64 to UTF-8

function b64DecodeUnicode(str) {
  return decodeURIComponent(atob(str).split('').map(function(c) {
    return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
  }).join(''));
}

This solution efficiently handles UTF-8 encoded strings without altering their representation.

TypeScript Support

// Encoding UTF-8 ⇢ base64

function b64EncodeUnicode(str) {
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
        return String.fromCharCode(parseInt(p1, 16))
    }))
}

// Decoding base64 ⇢ UTF-8

function b64DecodeUnicode(str) {
    return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
    }).join(''))
}

Historical Solution (Deprecated)

function utf8_to_b64( str ) {
  return window.btoa(unescape(encodeURIComponent( str )));
}

function b64_to_utf8( str ) {
  return decodeURIComponent(escape(window.atob( str )));
}

While still functional, this approach is now deprecated in modern browsers.

The above is the detailed content of How to Decode Base64 Strings in JavaScript While Handling UTF-8 Encoding?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn