search

Home  >  Q&A  >  body text

Curl crawls web pages for data

If the url link is https://mbd.baidu.com/newspage/data/landingsuper?context={"nid":"news_4480296238548479181"}&n_type=0&p_from=1

How to crawl web page information

phpcn_u68041phpcn_u680412593 days ago1234

reply all(5)I'll reply

  • phpcn_u68041

    phpcn_u680412017-12-07 16:41:30

    Use curl to crawl the website and pay attention to https

    reply
    0
  • ringa_lee

    ringa_lee2017-12-07 14:20:17

    Yes, what was said on the first floor is very complete. These two methods are usually used to obtain page information, file_get_contents and curl request

    reply
    0
  • NULL

    NULL2017-12-07 13:23:13

    You can use file_get_contents or curl code:

    function getHTTPS($url) {
      $ch = curl_init();
      curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
      curl_setopt($ch, CURLOPT_HEADER, false);
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
      curl_setopt($ch, CURLOPT_URL, $url);
      curl_setopt($ch, CURLOPT_REFERER, $url);
      curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
      $result = curl_exec($ch);
      curl_close($ch);
      return $result;
    }


    reply
    2
  • phpcn_u68041

    The default installation of curl does not support the https protocol. Do I need to add this? curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false ); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false );

    phpcn_u68041 · 2017-12-07 16:44:56
    NULL

    I don’t know much about curl. This is a piece of code I copied from the Internet. I tested that it can crawl HTTPS web page data and sent it to you.

    NULL · 2017-12-12 10:18:17
  • Cancelreply