Home  >  Article  >  Web Front-end  >  How do you decode UTF-8 base64 strings in JavaScript using `atob` while avoiding encoding errors?

How do you decode UTF-8 base64 strings in JavaScript using `atob` while avoiding encoding errors?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-10-31 21:08:291000browse

How do you decode UTF-8 base64 strings in JavaScript using `atob` while avoiding encoding errors?

Using atob to decode base64 from common text sources

When using atob to decode API response strings from services that generate their output in UTF-8, you may encounter errors or broken string encodings. This is due to the limitations of JavaScript's base64 handling:

<code class="js">const notOK = "✓"
console.log(btoa(notOK)); // error</code>

The Unicode Problem

Even after this error was resolved in ECMAScript, the "Unicode Problem" remains, as base64 is a binary format that assumes each encoded character occupies a single byte. Many Unicode characters require more than one byte to encode, which can lead to encoding failures.

Source: MDN (2021)

<code class="js">const ok = "a";
console.log(ok.codePointAt(0).toString(16)); // 0x61: occupies 1 byte

const notOK = "✓";
console.log(notOK.codePointAt(0).toString(16)); // 0x2713: occupies 2 bytes</code>

Solution with binary interoperability

If you're unsure which solution to choose, this is probably the one you want. Keep scrolling for the ASCII base64 solution and history of this answer.

Consider using a binary approach by converting UTF-8 strings to binary representations and vice versa.

Encoding UTF-8 ⇢ binary

<code class="js">function toBinary(string) {
  const codeUnits = new Uint16Array(string.length);
  for (let i = 0; i < codeUnits.length; i++) {
    codeUnits[i] = string.charCodeAt(i);
  }
  return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}
encoded = toBinary("✓ à la mode") // "EycgAOAAIABsAGEAIABtAG8AZABlAA=="</code>

Decoding binary ⇢ UTF-8

<code class="js">function fromBinary(encoded) {
  const binary = atob(encoded);
  const bytes = new Uint8Array(binary.length);
  for (let i = 0; i < bytes.length; i++) {
    bytes[i] = binary.charCodeAt(i);
  }
  return String.fromCharCode(...new Uint16Array(bytes.buffer));
}
decoded = fromBinary(encoded) // "✓ à la mode"</code>

Solution with ASCII base64 interoperability

To preserve UTF-8 functionality, another approach through ASCII base64 interoperability is recommended, which rectifies "The Unicode Problem" while maintaining compatibility with text-based base64 strings.

Encoding UTF-8 ⇢ ASCII base64

<code class="js">function b64EncodeUnicode(str) {
    // Percent-encode Unicode, then convert to byte array
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
        function(match, p1) {
            return String.fromCharCode('0x' + p1);
    }));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="</code>

Decoding ASCII base64 ⇢ UTF-8

<code class="js">function b64DecodeUnicode(str) {
    // Convert byte array to percent-encoding, then decode
    return decodeURIComponent(atob(str).split('').map(function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
    }).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"</code>

TypeScript Support

<code class="ts">function b64EncodeUnicode(str) {
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
        return String.fromCharCode(parseInt(p1, 16))
    }))
}
function b64DecodeUnicode(str) {
    return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
    }).join(''))
}</code>

Additional Notes

  • White space removal may be necessary for decoding base64 strings from sources like the GitHub API on Safari.
  • Libraries like js-base64 and base64-js also provide reliable solutions.

The above is the detailed content of How do you decode UTF-8 base64 strings in JavaScript using `atob` while avoiding encoding errors?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn