search

Home  >  Q&A  >  body text

node.js - 做爬虫时,偶尔会爬到这样的内容天堂向左,如何转为中文?

我想了解编码方面的知识,有好的书籍推荐吗,谢谢!!

感谢两位的回答,根据提示我自己写了个程序测试,OK的,还会不会有其他情况呢?

var code10, code16, zh;

code10 = '天堂向左,深圳向右';

zh = code10.replace(/&#(\d+);/g, function($, $1) {return String.fromCharCode($1)});

console.log(zh);

code16 = zh.replace(/[^\u0000-\u00ff]/g, function($) {return '&#x' + $.codePointAt(0).toString(16) + ';';});

console.log(code16);

zh = code16.replace(/&#x(\w+);/g, function($, $1) {return String.fromCharCode(parseInt($1, 16))});

console.log(zh);
PHP中文网PHP中文网2863 days ago615

reply all(2)I'll reply

  • 怪我咯

    怪我咯2017-04-17 13:25:21

    The beginning of

    is decimal encoding. What you need to pay attention to when converting to Chinese is that Chinese is a multi-character encoding. You can use javascript functions

    String.fromCharCode(str.substr(2),10)
    

    Write a small tool with a loop and process it on the front end before crawling, for example `String.fromCharCode("天".substr(2),10)
    Get "天".
    I happened to write a small tool today,

    https://github.com/hunnble/JavaScript_learning/blob/master/change-radix.html

    Open it in the browser and enter the characters you want to transcode, then select the base to 10 and then decode.

    reply
    0
  • PHPz

    PHPz2017-04-17 13:25:21

    〹The number 12345 in such an entity is the unicode encoding expressed in decimal, and it can be converted into the corresponding unicode character.

    If it is ካ, the 12ab is the unicode encoding expressed in hexadecimal.

    reply
    0
  • Cancelreply