


Detailed explanation of the steps to crawl Tmall and Taobao product data with PHP (with code)
This time I will bring you a detailed explanation of the steps for crawling Tmall and Taobao product data with php (with code). What are the precautions for php to crawl Tmall and Taobao product data. The following is the actual combat. Let’s take a look at the case.
1. Ideas
I recently made a website that crawled product information from Tmall and Taobao from the URL. I first looked at the mobile web page and found that it was used. React, I don’t know much about it and I can’t do it, so I considered crawling the data from the PC portal. However, when crawling the URL to obtain the data, the price, inventory, etc. information was not obtained. After careful study, I found that another interface was requested asynchronously. However, the interface needs to use refer to obtain data, so I wrote a simple crawler in the following way to crawl the product preview and the price, inventory, etc. of the first category of the product.
2. The code to implement
is as follows:
function crawlUrl($url){ import('PhpQuery.Curl'); $curl=new \Curl(); $result = $curl->read($url); $content = mb_convert_encoding( $result['content'], 'UTF-8', 'UTF-8,GBK,GB2312,BIG5' ); $myres=array(); if(strrpos($url,'taobao.com')!=false) { //匹配是否下架 if(strpos($content,'此宝贝已下架')!==false){ return false; } preg_match("|itemId : '(.*)'|isU", $content, $match); $item_id=$match[1]; preg_match("|sellerId : '(.*)'|isU", $content, $match); $sellet_id=$match[1]; preg_match("|<title>(.*)</title>|isU",$content,$match); $title=$match[1]; //价格库存信息 $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL, 'https://detailskip.taobao.com/service/getData/1/p1/item/detail/sib.htm?itemId='.$item_id.'&sellerId='.$sellet_id.'&modules=dynStock,qrcode,viewer,price,duty,xmpPromotion,delivery,upp,activity,fqg,zjys,amountRestriction,couponActivity,soldQuantity,originalPrice,tradeContract&callback=onSibRequestSuccess'); $opt[CURLOPT_HEADER]=false; $opt[CURLOPT_CONNECTTIMEOUT]=15; $opt[CURLOPT_TIMEOUT]=300; $opt[CURLOPT_AUTOREFERER]=true; $opt[CURLOPT_USERAGENT]='Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11'; curl_setopt_array($ch,$opt); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch,CURLOPT_REFERER,$url); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); $out_put=curl_exec ($ch); curl_close ($ch); $res=str_replace('onSibRequestSuccess(',"",$out_put); $res=rtrim($res,');1'); $result=json_decode($res,true); //查询出图片信息 preg_match('|
- (.*)

- (.*)

The above code uses the phpquery library, but it is actually useless. Just use Curl directly. The specific crawled data can be viewed through parameters. The method does not distinguish between Taobao and Tmall links, but the prerequisite is that it must be a PC-side link. In addition, the regular rules are not standardized, so you can rewrite the regular rules yourself to match the data.
I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!
Recommended reading:
Detailed explanation of the steps for php to generate images with QR codes and force download
LaravelS is accelerated through Swoole Detailed explanation of Laravel/Lumen steps
The above is the detailed content of Detailed explanation of the steps to crawl Tmall and Taobao product data with PHP (with code). For more information, please follow other related articles on the PHP Chinese website!

APHPDependencyInjectionContainerisatoolthatmanagesclassdependencies,enhancingcodemodularity,testability,andmaintainability.Itactsasacentralhubforcreatingandinjectingdependencies,thusreducingtightcouplingandeasingunittesting.

Select DependencyInjection (DI) for large applications, ServiceLocator is suitable for small projects or prototypes. 1) DI improves the testability and modularity of the code through constructor injection. 2) ServiceLocator obtains services through center registration, which is convenient but may lead to an increase in code coupling.

PHPapplicationscanbeoptimizedforspeedandefficiencyby:1)enablingopcacheinphp.ini,2)usingpreparedstatementswithPDOfordatabasequeries,3)replacingloopswitharray_filterandarray_mapfordataprocessing,4)configuringNginxasareverseproxy,5)implementingcachingwi

PHPemailvalidationinvolvesthreesteps:1)Formatvalidationusingregularexpressionstochecktheemailformat;2)DNSvalidationtoensurethedomainhasavalidMXrecord;3)SMTPvalidation,themostthoroughmethod,whichchecksifthemailboxexistsbyconnectingtotheSMTPserver.Impl

TomakePHPapplicationsfaster,followthesesteps:1)UseOpcodeCachinglikeOPcachetostoreprecompiledscriptbytecode.2)MinimizeDatabaseQueriesbyusingquerycachingandefficientindexing.3)LeveragePHP7 Featuresforbettercodeefficiency.4)ImplementCachingStrategiessuc

ToimprovePHPapplicationspeed,followthesesteps:1)EnableopcodecachingwithAPCutoreducescriptexecutiontime.2)ImplementdatabasequerycachingusingPDOtominimizedatabasehits.3)UseHTTP/2tomultiplexrequestsandreduceconnectionoverhead.4)Limitsessionusagebyclosin

Dependency injection (DI) significantly improves the testability of PHP code by explicitly transitive dependencies. 1) DI decoupling classes and specific implementations make testing and maintenance more flexible. 2) Among the three types, the constructor injects explicit expression dependencies to keep the state consistent. 3) Use DI containers to manage complex dependencies to improve code quality and development efficiency.

DatabasequeryoptimizationinPHPinvolvesseveralstrategiestoenhanceperformance.1)Selectonlynecessarycolumnstoreducedatatransfer.2)Useindexingtospeedupdataretrieval.3)Implementquerycachingtostoreresultsoffrequentqueries.4)Utilizepreparedstatementsforeffi


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 English version
Recommended: Win version, supports code prompts!

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Dreamweaver CS6
Visual web development tools

Atom editor mac version download
The most popular open source editor
