Home  >  Article  >  Backend Development  >  PHP string hash function algorithm implementation code

PHP string hash function algorithm implementation code

WBOY
WBOYOriginal
2016-07-25 08:54:301347browse
  1. function DJBHash($str) // 0.22
  2. {
  3. $hash = 0;
  4. $n = strlen($str);
  5. for ($i = 0; $i <$n; $i++)
  6. {
  7. $hash += ($hash <<5 ) + ord($str[$i]);
  8. }
  9. return $hash % 701819;
  10. }
  11. function ELFHash($str) // 0.35
  12. {
  13. $hash = $x = 0;
  14. $n = strlen($str);
  15. for ($i = 0; $i <$n; $i++)
  16. {
  17. $hash = ($hash <<4) + ord($str[$i]);
  18. if(($x = $hash & 0xf0000000) != 0)
  19. {
  20. $hash ^= ($x>> 24);
  21. $hash &= ~$x;
  22. }
  23. }
  24. return $hash % 701819;
  25. }
  26. function JSHash($str) // 0.23
  27. {
  28. $hash = 0;
  29. $n = strlen($str);
  30. for ($i = 0; $i <$n; $i++)
  31. {
  32. $hash ^= (($hash <<5) + ord($str[$i]) + ($hash>> 2));
  33. }
  34. return $hash % 701819;
  35. }
  36. function SDBMHash($str) // 0.23
  37. {
  38. $hash = 0 ;
  39. $n = strlen($str);
  40. for ($i = 0; $i <$n; $i++)
  41. {
  42. $hash = ord($str[$i]) + ($hash <<6 ) + ($hash <<16 ) - $hash;
  43. }
  44. return $hash % 701819;
  45. }
  46. function APHash($str) // 0.30
  47. {
  48. $hash = 0 ;
  49. $n = strlen($str);
  50. for ($i = 0; $i <$n; $i++)
  51. {
  52. if (($i & 1 ) == 0 )
  53. {
  54. $hash ^= (($hash <<7 ) ^ ord($str[$i]) ^ ($hash>> 3 ));
  55. }
  56. else
  57. {
  58. $hash ^= ( ~ (($hash <<11 ) ^ ord($str[$i]) ^ ($hash>> 5)));
  59. }
  60. }
  61. return $hash % 701819;
  62. }
  63. function DEKHash($str) // 0.23
  64. {
  65. $n = strlen($str);
  66. $hash = $n;
  67. for ($i = 0; $i <$n; $i++)
  68. {
  69. $hash = (($hash <<5) ^ ($hash>> 27)) ^ ord($str[$i]);
  70. }
  71. return $hash % 701819;
  72. }
  73. function FNVHash($str) // 0.31
  74. {
  75. $hash = 0;
  76. $n = strlen($str);
  77. for ($i = 0; $i <$n; $i++)
  78. {
  79. $hash *= 0x811C9DC5;
  80. $hash ^= ord($str[$i]);
  81. }
  82. return $hash % 701819;
  83. }
  84. function PJWHash($str) // 0.33
  85. {
  86. $hash = $test = 0;
  87. $n = strlen($str);
  88. for ($i = 0; $i <$n; $i++)
  89. {
  90. $hash = ($hash <<4) + ord($str[$i]);
  91. if(($test = $hash & -268435456) != 0)
  92. {
  93. $hash = (( $hash ^ ($test>> 24)) & (~-268435456));
  94. }
  95. }
  96. return $hash % 701819;
  97. }
  98. function PHPHash($str) // 0.34
  99. {
  100. $hash = 0;
  101. $n = strlen($str);
  102. for ($i = 0; $i <$n; $i++)
  103. {
  104. $hash = ($hash <<4) + ord($str[$i]);
  105. if (($g = ($hash & 0xF0000000)))
  106. {
  107. $hash = $hash ^ ($g>> 24);
  108. $hash = $hash ^ $g;
  109. }
  110. }
  111. return $hash % 701819;
  112. }
  113. function OpenSSLHash($str) // 0.22
  114. {
  115. $hash = 0;
  116. $n = strlen($str);
  117. for ($i = 0; $i <$n; $i++)
  118. {
  119. $hash ^= (ord($str[$i]) <<($i & 0x0f));
  120. }
  121. return $hash % 701819;
  122. }
  123. function MD5Hash($str) // 0.050
  124. {
  125. $hash = md5($str);
  126. $hash = $hash[0] | ($hash[1] <<8 ) | ($hash[2] <<16) | ($hash[3] <<24) | ($hash[4] <<32) | ($hash[5] <<40) | ($hash[6] <<48) | ($hash[7] <<56);
  127. return $hash % 701819;
  128. }
复制代码

Algorithm description: The comment behind the function is the execution speed (unit: s) of 1000 times in my local test. It can be seen that MD5Hash is the fastest, and it is much faster than other functions... But it can also be seen from the algorithm of this function. , it only relies on the first 7 characters of the string after md5. That is to say, if the first 7 characters are the same, the hash value obtained is exactly the same, so in fact, its distribution is not very trustworthy. ....If calculated based on 32 characters, the speed will be far slower than other algorithms...

Except for MD5Hash, other algorithms will be affected by the length of the string. The longer it is, the slower it is. I used 10 characters in English for the test. The final return of each function $hash % 701819; 701819 represents the maximum capacity of the hash, which means that the final number range obtained by these hash functions is 0~701819. This number can be changed. It is generally considered to use a large one. The distribution of prime number results will be relatively even. Several suggested values ​​near 701819 are: 175447, 350899, 1403641, 2807303, 5614657.

What can this be used for...

Why do we need to organize and test these hash algorithms? I am writing a multi-user blog. Well... I also mentioned it in the previous blog. Multi-user blogs generally have a function, which is to use a user name that is a combination of English and numbers. as the blog address (second-level domain name or directory). Then there is a question, how to get the user's ID based on the user name, is there one more query? With the hash function, there is no need. Use the hash function to process the user name, get a number, and then do certain processing on the number (I divided it into hierarchical directories based on 2 digits. The purpose is to prevent too many files in one directory. file (which affects the disk retrieval speed), and then a path is formed, and the corresponding ID is saved in the file under this path (I personally recommend the user name as the file name), so that the user's ID can be obtained directly based on the user name. , no query is required, the user name is used as the file name, so even if the final results are the same, they are in different files, so there is no need to worry about collisions.

Of course...if your system operates entirely based on user names, then I didn't say this before = =b, I quietly criticize SELECT because numbers are faster than strings.

I chose the DJB algorithm. If the MD5 distribution test is acceptable after it goes online, I will consider switching to it.

It can also be seen from here that hashing is actually very useful for distribution. Haha, it can be used for caching, static or other things that require distributed storage.



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn