Home >Web Front-end >JS Tutorial >Javascript calculates the number of bytes occupied by a string in localStorage_javascript tips

Javascript calculates the number of bytes occupied by a string in localStorage_javascript tips

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal
2016-05-16 15:35:571318browse

A recent project has a requirement to use js to calculate the memory occupied by a string written to localStorage. As we all know, js uses Unicode encoding. There are N types of Unicode implementations, among which UTF-8 and UTF-16 are the most commonly used. Therefore, this article only discusses these two encodings.

The following definition is taken from Wikipedia (http://zh.wikipedia.org/zh-cn/UTF-8), with some deletions.

UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode that can represent any character in the Unicode standard, and the first byte in its encoding is still compatible with ASCII , using one to four bytes to encode each character

The encoding rules are as follows:

Character codes between 000000 – 00007F are encoded with one byte;

Characters between 000080 – 0007FF use two bytes;
Use three bytes between 000800 – 00D7FF and 00E000 – 00FFFF. Note: Unicode does not have any characters in the range D800-DFFF;
Use 4 bytes between 010000 – 10FFFF.

UTF-16 is a fixed-length character encoding. Most characters use two bytes to encode, and character codes exceeding 65535 use four bytes, as follows:

000000 – 00FFFF two bytes;
010000 – 10FFFF four bytes.

At first I thought that since the page is encoded in UTF-8, the strings stored in localStorage should also be encoded in UTF-8. But later tests found that the calculated size was less than 5MB, but an exception was thrown when saving it to localStorage. After thinking about it, the encoding of the page can be changed. If localStorage stores strings according to the encoding of the page, wouldn't it be a mess? Browsers should all use UTF-16 encoding. The 5MB string was calculated using UTF-16 encoding, and it was successfully written. If it exceeds, it fails.

Okay, here’s the code implementation. The calculation rules are as written above. For calculation speed, the two for loops are written separately.

/**
   * 计算字符串所占的内存字节数,默认使用UTF-8的编码方式计算,也可制定为UTF-16
   * UTF-8 是一种可变长度的 Unicode 编码格式,使用一至四个字节为每个字符编码
   *
   * 000000 - 00007F(128个代码)   0zzzzzzz(00-7F)               一个字节
   * 000080 - 0007FF(1920个代码)   110yyyyy(C0-DF) 10zzzzzz(80-BF)       两个字节
   * 000800 - 00D7FF
    00E000 - 00FFFF(61440个代码)  1110xxxx(E0-EF) 10yyyyyy 10zzzzzz      三个字节
   * 010000 - 10FFFF(1048576个代码) 11110www(F0-F7) 10xxxxxx 10yyyyyy 10zzzzzz 四个字节
   *
   * 注: Unicode在范围 D800-DFFF 中不存在任何字符
   * {@link http://zh.wikipedia.org/wiki/UTF-8}
   *
   * UTF-16 大部分使用两个字节编码,编码超出 65535 的使用四个字节
   * 000000 - 00FFFF 两个字节
   * 010000 - 10FFFF 四个字节
   *
   * {@link http://zh.wikipedia.org/wiki/UTF-16}
   * @param {String} str
   * @param {String} charset utf-8, utf-16
   * @return {Number}
   */
  var sizeof = function(str, charset){
    var total = 0,
      charCode,
      i,
      len;
    charset = charset ? charset.toLowerCase() : '';
    if(charset === 'utf-16' || charset === 'utf16'){
      for(i = 0, len = str.length; i < len; i++){
        charCode = str.charCodeAt(i);
        if(charCode <= 0xffff){
          total += 2;
        }else{
          total += 4;
        }
      }
    }else{
      for(i = 0, len = str.length; i < len; i++){
        charCode = str.charCodeAt(i);
        if(charCode <= 0x007f) {
          total += 1;
        }else if(charCode <= 0x07ff){
          total += 2;
        }else if(charCode <= 0xffff){
          total += 3;
        }else{
          total += 4;
        }
      }
    }
    return total;
  }

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn