Home  >  Article  >  Backend Development  >  How to parse Baidu search results link?url=parameter

How to parse Baidu search results link?url=parameter

WBOY
WBOYOriginal
2016-07-25 09:08:165653browse
  1. Query Baidu link ?ulr=Real link form
  2. /*
  3. getrealurl Get the URL address after 301 and 302 redirection by enenba.com
  4. @param str $url Query
  5. $return str The real url of the directed url
  6. */
  7. function getrealurl($url){
  8. $header = get_headers($url,1);
  9. if (strpos($header[0],'301 ') || strpos($header[0],'302')) {
  10. if(is_array($header['Location'])) {
  11. return $header['Location'][count($header['Location '])-1];
  12. }else{
  13. return $header['Location'];
  14. }
  15. }else {
  16. return $url;
  17. }
  18. }
  19. $input = '
    ';
  20. $url = isset($_GET['url'])?$_GET['url']:'';
  21. if(empty($url) ) exit($input);
  22. $urlreal = getrealurl($url);
  23. echo 'The real url is:'.$urlreal;
  24. $urlreal = ltrim($urlreal,'http://');
  25. $search = '/ebac5573358cc3c0659257bfcf54([0-9a-f]+)/i';
  26. preg_match($search,$url,$r);
  27. $url_encode = $r[1]; unset($r);
  28. echo '
    The ciphertext part is: '.$url_encode.'
    ';
  29. $urlreal_arr = str_split($urlreal);
  30. $url_encode_arr = str_split($url_encode,2);
  31. echo '
    ';
  32. echo $input;
  33. ?>
Copy code

Disclaimer: The articles on cnbeta are not published by me. My analysis is only based on my own ideas and research. It is just a process. As for whether there are results, I have my own conclusions. Please don't rant. After carefully looking at the long code of the Baidu result URL, I found that the ciphertext only consists of numbers and letters a to f, which is a hexadecimal code. Hexadecimal is from 0->1->2->3->4->5->7->8->9->a->b->c->d->e->f I collected a series of URLs and counted the first code. ebac5573358cc3c0659257bfcf54XX...... The URL corresponding to the XX code is as follows 33 0 23 @ 13 P 03 ` 73 p 63 ! 32 1 22 A 12 Q 02 a 72 q 62 " 31 2 21 B 11 R 01 b 71 r 61 # 30 3 20 C 10 S 00 C 70 s 60 $37 4 27 D 17 T 07 D 77 T 67 % 36 5 26 E 16 U 06 e 76 u 66 & 35 6 25 F 15 V 05 f 75 v 65 ' 34 7 24 G 14 W 04 g 74 w 64 ( 3b 8 2b H 1b X 0b h 7b x 6b ) 3a 9 2a I 1a Y 0a i 7a y 6a * 39 : 29 J 19 Z 09 j 79 z 69 + 38 ; 28 K 18 [ 08 k 78 { 68 , 3f 2d N 1d ^ 0d n 7d ~ 6d / 3c ? 2c O 1c _ 0c o 7c 6c

I found that it should be characters in an ascii code table, but the order should be confused. But it’s all like this in this single base system: 3->2->1->0->7->6->5->4->b->a->9->8->f->e->d->c Four digits in descending order, it can be seen that the overall order is decreasing. But what is puzzling is that from _ to ` are adjacent in ASCII, and the corresponding 0c and 73 are jumping. No way, I can't see the pattern. Let's look at the second set of codes. ebac5573358cc3c0659257bfcf54XXYY. . . . The URL corresponding to the code YY is as follows 70 0 60 @ 50 P 40 ` 30 P 20 ! 71 1 61 A 51 Q 41 a 31 q 21 " 72 2 62 B 52 R 42 b 32 r 22 # 73 3 63 C 53 S 43 c 33 s 23 $ 74 4 64 D 54 T 44 d 34 t 24 % 75 5 65 E 55 U 45 e 35 u 25 & 76 6 66 F 56 V 46 f 36 v 26 ' 77 7 67 G 57 W 47 g 37 w 27 (78 8 68 H 58 x 48 h 38 x 28 ) 79 9 69 I 59 Y 49 i 39 y 29 * 7a : 6a J 5a Z 4a j 3a z 2a + 7b ; 6b K 5b [ 4b k 3b { 2b , 7c 6e N 5e ^ 4e n 3e ~ 2e / 7f ? 6f O 5f _ 4f o 3f 2f

The secret text of the second group follows the increasing order of hexadecimal. 0->1->2->3->4->5->7->8->9->a->b->c->d->e->f Overall it is decreasing. Let’s look at the third group ebac5573358cc3c0659257bfcf54XXYYZZ. . . . The URL corresponding to the ZZ code is as follows 84 0 94 @ a4 P b4 ` c4 p d4 ! 85 1 95 A a5 Q b5 a c5 q d5 " 86 2 96 B a6 R b6 b c6 r d6 # 87 3 97 C a7 S b7 c c7 s d7 $ 80 4 90 D a0 T b0 d c0 t d0 % 81 5 91 E a1 U b1 e c1 u d1 & 82 6 92 F a2 V b2 f c2 v d2 ' 83 7 93 G a3 W b3 g c3 w d3 ( 8c 8 9c H ac X bc h cc x dc ) 8b 9 9b I ab Y bb i cd y dd * 8e : 9e J ae Z be j ce z de + 8f ; 9f K af [ bf k cf { df , 88 9a N aa ^ ba n ca ~ da / 8b ? 9b O ab _ bb o cb db

I won’t explain the order: 4->5->6->7->0->1->2->3->4->c->b->e->f->8->9->a->b Overall it is increasing I haven’t looked at the number of digits at the end, but I can probably tell that it is a group of four digits in hexadecimal confusion. As for whether it is increasing or decreasing, a certain amount of data is needed to judge. Next time, 1,000 URL data will be collected for judgment.



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn