What should you pay attention to when getting web content in php?
Notes on obtaining web page content with PHP
1. Network errors will occur, and any error is possible. For example, the machine is down, the network cable is broken, the domain name is wrong, the network times out, the page is gone, the website jumps, the service is banned, the host load is not enough...
2. The server has added restrictions. Only allow common browsers to access
3. The server has added anti-hotlinking restrictions
4. Some websites do not care whether there is an Accept-Encoding header in your HTTP request, or whether you have a header. What is the specific content of the part? Anyway, I will always send you the gzipped content
5. URL links are all kinds of weird, including ones with Chinese characters, and some even have carriage return and line feed
6. Some websites have a Content-Type in the HTTP header, and there are several Content-Types in the web page. What’s even more outrageous is that each Content-Type is different. The most outrageous thing is that these Content-Types may not be used in the text. Content-Type, resulting in garbled characters
7. The network link is very slow. Multiplied by the time it takes to analyze thousands of pages, I suggest you have a good meal
Get PHP Web page content method
Method 1. Use the file_get_contents method to implement
$url = "http://news.sina.com.cn/c/nd/2016-10-23/doc-ifxwztru6951143.shtml"; $html = file_get_contents($url); //如果出现中文乱码使用下面代码 //$getcontent = iconv("gb2312", "utf-8",$html); echo "<textarea style='width:800px;height:600px;'>".$html."</textarea>";
Method 2. Use curl to implement
$url = "http://news.sina.com.cn/c/nd/2016-10-23/doc-ifxwztru6951143.shtml"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); $html = curl_exec($ch); curl_close($ch); echo "<textarea style='width:800px;height:600px;'>".$html."</textarea>"; curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
Adding this code means that if the request is redirected, you can access the final request page, otherwise the request result will display the following content:
<head><title>Object moved</title></head> <body><h1 id="Object-nbsp-Moved">Object Moved</h1>This object may be found <a href="some link." rel="external nofoll
Recommended tutorial:PHP video tutorial
The above is the detailed content of What should you pay attention to when getting web content in php?. For more information, please follow other related articles on the PHP Chinese website!

The article compares ACID and BASE database models, detailing their characteristics and appropriate use cases. ACID prioritizes data integrity and consistency, suitable for financial and e-commerce applications, while BASE focuses on availability and

The article discusses securing PHP file uploads to prevent vulnerabilities like code injection. It focuses on file type validation, secure storage, and error handling to enhance application security.

Article discusses best practices for PHP input validation to enhance security, focusing on techniques like using built-in functions, whitelist approach, and server-side validation.

The article discusses strategies for implementing API rate limiting in PHP, including algorithms like Token Bucket and Leaky Bucket, and using libraries like symfony/rate-limiter. It also covers monitoring, dynamically adjusting rate limits, and hand

The article discusses the benefits of using password_hash and password_verify in PHP for securing passwords. The main argument is that these functions enhance password protection through automatic salt generation, strong hashing algorithms, and secur

The article discusses OWASP Top 10 vulnerabilities in PHP and mitigation strategies. Key issues include injection, broken authentication, and XSS, with recommended tools for monitoring and securing PHP applications.

The article discusses strategies to prevent XSS attacks in PHP, focusing on input sanitization, output encoding, and using security-enhancing libraries and frameworks.

The article discusses the use of interfaces and abstract classes in PHP, focusing on when to use each. Interfaces define a contract without implementation, suitable for unrelated classes and multiple inheritance. Abstract classes provide common funct


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.