search
HomeBackend DevelopmentPHP ProblemWhat should you pay attention to when getting web content in php?

What should you pay attention to when getting web content in php?

Notes on obtaining web page content with PHP

1. Network errors will occur, and any error is possible. For example, the machine is down, the network cable is broken, the domain name is wrong, the network times out, the page is gone, the website jumps, the service is banned, the host load is not enough...

2. The server has added restrictions. Only allow common browsers to access

3. The server has added anti-hotlinking restrictions

4. Some websites do not care whether there is an Accept-Encoding header in your HTTP request, or whether you have a header. What is the specific content of the part? Anyway, I will always send you the gzipped content

5. URL links are all kinds of weird, including ones with Chinese characters, and some even have carriage return and line feed

6. Some websites have a Content-Type in the HTTP header, and there are several Content-Types in the web page. What’s even more outrageous is that each Content-Type is different. The most outrageous thing is that these Content-Types may not be used in the text. Content-Type, resulting in garbled characters

7. The network link is very slow. Multiplied by the time it takes to analyze thousands of pages, I suggest you have a good meal

Get PHP Web page content method

Method 1. Use the file_get_contents method to implement

$url = "http://news.sina.com.cn/c/nd/2016-10-23/doc-ifxwztru6951143.shtml";
    $html = file_get_contents($url);
    //如果出现中文乱码使用下面代码
    //$getcontent = iconv("gb2312", "utf-8",$html);
    echo "<textarea style=&#39;width:800px;height:600px;&#39;>".$html."</textarea>";

Method 2. Use curl to implement

$url = "http://news.sina.com.cn/c/nd/2016-10-23/doc-ifxwztru6951143.shtml";
    
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$html = curl_exec($ch);
curl_close($ch);

echo "<textarea style=&#39;width:800px;height:600px;&#39;>".$html."</textarea>";
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

Adding this code means that if the request is redirected, you can access the final request page, otherwise the request result will display the following content:

<head><title>Object moved</title></head>
<body><h1 id="Object-nbsp-Moved">Object Moved</h1>This object may be found <a href="some link." rel="external nofoll

Recommended tutorial:PHP video tutorial

The above is the detailed content of What should you pay attention to when getting web content in php?. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor