Home  >  Article  >  Backend Development  >  GBK to pinyin pinyin, please note that it is not the GB2312 to pinyin that is everywhere on the Internet

GBK to pinyin pinyin, please note that it is not the GB2312 to pinyin that is everywhere on the Internet

WBOY
WBOYOriginal
2016-07-25 09:07:581448browse
After searching on the Internet, "PHP to Pinyin" is everywhere, but the code only supports GB2312.
Even the words "骞", "霜", and "Hui" cannot be supported.
For converting Chinese names to Pinyin, that code is far from enough.

A correct GBK to Pinyin code is now announced. Many thanks to the original author: Ma Minglian (!hightman) Home page: http://php.twomice.net
The original DEMO only supports one-word query. I modified it to support a string, both Chinese and English. The original text is returned in English, and the Chinese characters are converted into pinyin with tones.
Multi-phonetic characters are not supported for the time being. For example, "Zeng" will be converted to ceng2, but there is no zeng1.

(There is a yii-chinese PHP project on the Internet, but unfortunately the pinyin conversion is also GB2312)


How to use:
$str = 'Chinese characters abc123-=+';
$py = pinyin::instance(); //Single instance, if you like, $py = new pinyin(); also works.
echo $py->get($str);


Please download the entire code at the end of the page.


GBK to pinyin pinyin, please note that it is not the GB2312 to pinyin that is everywhere on the Internet
  1. class pinyin{
  2. var $_fp = false;
  3. function __construct() {
  4. $_dat = DISCUZ_ROOT."./source/include/table/py.dat";
  5. if (! $this->_fp)
  6. $this->_fp = fopen($_dat,'rb');
  7. }
  8. /**
  9. * return a simple instance
  10. * @return
  11. */
  12. function &instance() {
  13. static $object;
  14. if(empty($object)) {
  15. $object = new self();
  16. }
  17. return $object;
  18. }
  19. function anystring2gbk($str) {
  20. $encode = mb_detect_encoding($str,"ASCII,UNICODE, UTF-8,GBK,CP936,EUC-CN,BIG-5,EUC-TW");
  21. return ($encode != 'CP936' && $encode != 'ASCII' && $encode != 'GBK' ? iconv ($encode,'GBK',$str) : $str);
  22. }
  23. function get($str,$separator = ' ') {
  24. //$str = iconv('UTF-8','GBK', $str);
  25. $str = anystring2gbk($str); //If you can determine the input encoding, you can use the previous line of code
  26. $len = strlen($str);
  27. $i = 0;$result = array( );
  28. while ($i < $len) {
  29. $s = ord($str{$i});
  30. if ($s > 160) {
  31. $word = $this->word2py($s ,ord($str{++$i}));
  32. } else {
  33. $word = $str{$i};
  34. }
  35. $result[] = $word;
  36. ++$i;
  37. }
  38. return implode($separator,$result);
  39. }
  40. private function word2py($h,$l) {
  41. $high = $h - 0x81;
  42. $low = $l - 0x40;
  43. // Calculate offset position
  44. $off = ($high<<8) + $low - ($high * 0x40);
  45. // Determine the off value
  46. if ($off < 0) {
  47. return chr($h).chr($ l);
  48. }
  49. fseek($this->_fp, $off * 8, SEEK_SET);
  50. $ret = fread($this->_fp, 8);
  51. $ret = unpack('a8py', $ ret);
  52. return $ret['py'];
  53. }
  54. function __destruct() {
  55. if ($this->_fp)
  56. fclose($this->_fp);
  57. }
  58. }
  59. ?>
Copy code


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn