Home  >  Article  >  Backend Development  >  PHP implements methods and error handling for crawling HTTPS content

PHP implements methods and error handling for crawling HTTPS content

高洛峰
高洛峰Original
2016-10-20 14:41:07918browse

I recently encountered an HTTPS issue while researching the Hacker News API. Because all Hacker News APIs are accessed through the encrypted HTTPS protocol, which is different from the ordinary HTTP protocol, when using the function file_get_contents() in PHP to obtain the data provided in the API, an error occurs. The code used is as follows :

<?php
$data = file_get_contents("https://www.liqingbo.cn/son?print=pretty");
......

When running the above code, the following error message is encountered:

PHP Warning:  file_get_contents(): Unable to find the wrapper "https" - did you forget to enable it when you configured PHP?

The following is a screenshot:

PHP implements methods and error handling for crawling HTTPS content

Why does such an error occur?

After searching on the Internet, I found that many people have encountered this error. The problem is very direct, because there is no parameter enabled in the PHP configuration file. On my local machine, it is /apache/bin/php. In the ;extension=php_openssl.dll item in ini, you need to remove the preceding semicolon. You can use the following script to check the configuration of your PHP environment:

$w = stream_get_wrappers();

echo 'openssl: ', extension_loaded ('openssl') ? 'yes':'no', " n";

echo 'http wrapper: ', in_array('http', $w) ? 'yes':'no', "n";

echo 'https wrapper: ', in_array('https', $ w) ? 'yes':'no', "n";

echo 'wrappers: ', var_dump($w);

Run the above script snippet and get the result on my machine Is:

openssl: no
http wrapper: yes
https wrapper: no
wrappers: array(10) {
  [0]=>
  string(3) "php"
  [1]=>
  string(4) "file"
  [2]=>
  string(4) "glob"
  [3]=>
  string(4) "data"
  [4]=>
  string(4) "http"
  [5]=>
  string(3) "ftp"
  [6]=>
  string(3) "zip"
  [7]=>
  string(13) "compress.zlib"
  [8]=>
  string(14) "compress.bzip2"
  [9]=>
  string(4) "phar"
}

Alternative

Find the error and correct it. This is very simple. The difficult thing is that you cannot correct the error after you find it. I originally wanted to put this script method on the remote host, but I couldn't modify the PHP configuration of the remote host. The result was that I couldn't use this solution, but we can't hang ourselves on a tree. This road doesn't work. Let's take a look. Is there any other way?

Another function that I often use to capture content in PHP is curl. It is more powerful than file_get_contents() and provides a lot of optional parameters. For the problem of accessing HTTPS content, the CURL configuration parameters we need to use are:

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);

You can see from the semantics that it ignores/skips SSL security verification. Maybe this is not a good idea, but for ordinary scenarios, this is enough.

The following is a function encapsulated by Curl that can access HTTPS content:

function getHTTPS($url) {
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
  curl_setopt($ch, CURLOPT_HEADER, false);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_REFERER, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
  $result = curl_exec($ch);
  curl_close($ch);
  return $result;
}

The above is the entire process of obtaining https content in PHP. It is very simple and practical. It is recommended to friends who have the same project needs.


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:php gets http headersNext article:php gets http headers