It is often found that the data to be used is all on the same website, and the data presentation format is the same. For example, there are thousands of products on Taobao or Amazon. If the information is entered manually, the workload will be too much. At this time, we can write a collection program to directly collect and display it. The server supports file_get_contents and curl
First add a text box and submit button to the page. The text box is used to enter the collection page address.
Collection needs to use the regular interception function
function preg_substr($start, $end, $str) // 正则截取函数 { $temp = preg_split($start, $str); $content = preg_split($end, $temp[1]); return $content[0]; }
Collection needs to use the string interception function
function str_substr($start, $end, $str) // 字符串截取函数 { $temp = explode($start, $str, 2); $content = explode($end, $temp[1], 2); return $content[0]; }
There is also a function to save the collected content:
function writelog($str) { @unlink("log.txt"); $open=fopen("log.txt","a" ); fwrite($open,$str); fclose($open); }
Sometimes the collected content is inconsistent with the content we view through the browser, causing us to not be able to find the correct regular expression, here you can Open the saved txt file and find the correct intercepted string in it.
If you need to collect even pictures, you need to use the picture function:
function getImage($url, $filename='', $dirName, $fileType, $type=0) { if($url == ''){return false;} //获取文件原文件名 $defaultFileName = basename($url); //获取文件类型 $suffix = substr(strrchr($url,'.'), 1); if(!in_array($suffix, $fileType)){ return false; } //设置保存后的文件名 $filename = $filename == '' ? time().rand(0,9).'.'.$suffix : $defaultFileName; //获取远程文件资源 if($type){ $ch = curl_init(); $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $file = curl_exec($ch); curl_close($ch); }else{ ob_start(); readfile($url); $file = ob_get_contents(); ob_end_clean(); } //设置文件保存路径 $dirName = $dirName.'/'.date('Y', time()).'/'.date('m', time()).'/'.date('d',time()).'/'; if(!file_exists($dirName)){ mkdir($dirName, 0777, true); } //保存文件 $res = fopen($dirName.$filename,'a'); fwrite($res,$file); fclose($res); return $dirName.$filename; }
Add the collection code. Since the collection code is added here to prevent submission, you can directly Above;
Let’s take a product page on Amazon as an example: enter a product link:
Look at the collection results as shown below. Only the content is displayed here. It is relatively simple to add it to the database. There will be time to introduce the collection of automatically entering lower-level links or automatically turning pages.
The above is the detailed content of Use file_get_contents and curl to write collection. For more information, please follow other related articles on the PHP Chinese website!

PHPidentifiesauser'ssessionusingsessioncookiesandsessionIDs.1)Whensession_start()iscalled,PHPgeneratesauniquesessionIDstoredinacookienamedPHPSESSIDontheuser'sbrowser.2)ThisIDallowsPHPtoretrievesessiondatafromtheserver.

The security of PHP sessions can be achieved through the following measures: 1. Use session_regenerate_id() to regenerate the session ID when the user logs in or is an important operation. 2. Encrypt the transmission session ID through the HTTPS protocol. 3. Use session_save_path() to specify the secure directory to store session data and set permissions correctly.

PHPsessionfilesarestoredinthedirectoryspecifiedbysession.save_path,typically/tmponUnix-likesystemsorC:\Windows\TemponWindows.Tocustomizethis:1)Usesession_save_path()tosetacustomdirectory,ensuringit'swritable;2)Verifythecustomdirectoryexistsandiswrita

ToretrievedatafromaPHPsession,startthesessionwithsession_start()andaccessvariablesinthe$_SESSIONarray.Forexample:1)Startthesession:session_start().2)Retrievedata:$username=$_SESSION['username'];echo"Welcome,".$username;.Sessionsareserver-si

The steps to build an efficient shopping cart system using sessions include: 1) Understand the definition and function of the session. The session is a server-side storage mechanism used to maintain user status across requests; 2) Implement basic session management, such as adding products to the shopping cart; 3) Expand to advanced usage, supporting product quantity management and deletion; 4) Optimize performance and security, by persisting session data and using secure session identifiers.

The article explains how to create, implement, and use interfaces in PHP, focusing on their benefits for code organization and maintainability.

The article discusses the differences between crypt() and password_hash() in PHP for password hashing, focusing on their implementation, security, and suitability for modern web applications.

Article discusses preventing Cross-Site Scripting (XSS) in PHP through input validation, output encoding, and using tools like OWASP ESAPI and HTML Purifier.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.
