Home >Backend Development >PHP Tutorial >PHP realizes collecting and capturing single product information on Taobao and capturing product information_PHP tutorial

PHP realizes collecting and capturing single product information on Taobao and capturing product information_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal
2016-07-13 10:10:041214browse

PHP implements collection and capture of single product information on Taobao, and captures product information

To call Taobao data, you can use the API provided by Taobao. If you only need to call Taobao product image names and other public information on your own website, use the file_get_contents function in PHP to achieve it.

Things:

file_get_contents(url) This function outputs the webpage content (source code) in the form of a string (a whole string) based on the URL such as http://www.baidu.com, and then matches it with regular expressions such as preg_match, preg_replace, etc. This operation can be used to obtain the specific div, img and other information of the url. Of course, the premise is that the structure of a single product page on Taobao is fixed. For example, the id in the img of 500 pictures is J_ImgBooth!

Specific implementation method: (Get 500 pictures, name, price, attributes and product description)

Copy code The code is as follows:

$text=file_get_contents("http://item.taobao.com/item.htm?id=2380347279"); //Save the page content on the url address into $text

A. Get 500 pictures:

Copy code The code is as follows:

preg_match('/]*id="J_ImgBooth"[^r]*rc="([^"]*)"[^>]*>/', $text, $img );
//Use regular rules to capture the img with the id J_ImgBooth in the img tag. $img[0] is the img tag of the 500 pictures, and $img[1] is the picture address of the 500 pictures;

B. Get name:

Copy code The code is as follows:

preg_match('/([^<>]*)/', $text, $title);
//Because the product name tag in the text does not have a special class or id, it is difficult to crawl, so the content in the tag is captured. Generally speaking, the content in the title is the product name (actually there are some differences), $title [0]The entire title tag$title[1]The content of the tag;<br> $title=iconv('GBK','UTF-8',$title);<br> //If your website is utf8 encoded, you need to transcode it (Taobao is gbk encoded) <br> </div> <p><strong>C. Get price: </strong></p> <p></p> <div class="codetitle"> <span><u>Copy code</u></span> The code is as follows:</div> <div class="code" id="code3162"> <br> preg_match('/<([a-z]+)[^i]*id="J_StrPrice"[^>]*>([^<]*)</\1>/is', $text , $price);<br> //Similarly obtain the tag content $price[2] with id J_StrPrice, $price[0] is the entire tag, $price[1] is the strong tag name; <br> $price=floatval($price);//Put it into the database and probably change the variable type<br> </div> <p><strong>D. Get attributes: </strong></p> <p>The content obtained before is all in a single tag, which can be obtained with only one regular expression. However, if you want to obtain such as </p> <p></p> <div class="codetitle"> <span><u>Copy code</u></span> The code is as follows:</div> <div class="code" id="code84600"> <br> …<br> <br> <div id=”xxx”><br> <br> …<br> <br> <ul><br> <br> …<br> <br> </ul><br> <br> <div>…<br> <br> <div>…<br> <br> </div><br> <br> </div><br> <br> </div><br> <br> …<br> </div> <p>There are n unknown <> tags in a specific div. It will be very difficult to obtain this specific div. After searching on the Internet, the closest one is "/<([a-z]+)[^>" ;]*>([^<>]|(?R))*</\1>/" uses recursion to grab tag pairs, but it cannot grab specific tags, so I want to easily grab the class I can’t do it with the div of =”attributes”. However, Taobao web pages have their own particularity, that is, the structure of each tag is basically fixed...<div>...</div>The tag behind is either </div><div id=”description”> or< /div><div>, so we can use workarounds to obtain the content of the attribute tag. </p> <p></p> <div class="codetitle"> <span><u>Copy code</u></span> The code is as follows:</div> <div class="code" id="code66846"> <br> preg_match('/<(div)[^c]*class="attributes"[^>]*>.*</\1>/is', $text, $text0);<br> //This regular rule will capture the beginning of <div to the last </div> tag of the entire page. Of course, our attribute tag is in the front part of this. <br> <br> $text1=preg_replace("/</div>[^<]*<(div)[^c]*id="description"[^>]*>.*</\1>/ is","",$text0);<br> //Match </div ><div id="description"> to the end </div> and then replace it with "" (that is, delete the matching one), so if the attributes div is followed by If it is description, then we have achieved our goal. <br> <br> $attributes=preg_replace("/</div>[^<]*<(div)[^c]*class="box J_TBox"[^>]*>.*</\1> /is","",$text1);<br> //If attributes are followed by the box J_Tbox tag, then we also need to use the above step to remove the box J_Tbox tag. Of course, if the div of attributes is followed by description, this step will not match anything, that is, nothing. Do. <br> </div> <p><strong>E. Get description: </strong></p> <p>Through the above method, you must think that any tag on the Taobao page can be easily obtained (I thought so before), but when you use this method to obtain the description, the content you get will be "Description Loading", which is Yes, this description is not in the source code. It is loaded from nowhere in Taobao after opening the page and loading a lot of js. </p> <p>Okay, then we can also imitate it and put some js in it. Not sure what would be useful for loading descriptions? It's okay, it must be loaded in all. I don’t know which specific divs need to be placed there? Grab a source code, delete some divs and try it step by step. You will find "<div id="detail"> </div></p> <p></p> <div class="codetitle"> <span><u>Copy code</u></span> The code is as follows:</div> <div class="code" id="code93549"> <br> <div id="description"><br> <br> <div id="J_DivItemDesc">Description loading</div><br> <br> </div><br> </div> <p>These divs are necessary to load the description, so the following is the code: </p> <p></p> <div class="codetitle"> <span><u>Copy code</u></span> The code is as follows:</div> <div class="code" id="code13439"> <br> preg_match_all('/<script[^>]*>[^<]*</script>/is', $text, $content);//Page js script<br> $content=$content[0];<br> $description='<div id="detail"> </div><br> <div id="description"><br> <div id="J_DivItemDesc">Description loading</div><br> </div>';<br> foreach ($content as &$v){$description.=iconv('GBK','UTF-8',$v);};<br> //Put this $description into the page, and the description will be automatically loaded. Of course, if multiple product descriptions are placed on the same page, only one description will be loaded. <br> </div> <p align="left"></p> <div style="display:none;"> <span id="url" itemprop="url">http://www.bkjia.com/PHPjc/939398.html</span><span id="indexUrl" itemprop="indexUrl">www.bkjia.com</span><span id="isOriginal" itemprop="isOriginal">true</span><span id="isBasedOnUrl" itemprop="isBasedOnUrl">http: //www.bkjia.com/PHPjc/939398.html</span><span id="genre" itemprop="genre">TechArticle</span><span id="description" itemprop="description">PHP implements collection and capture of single product information on Taobao. To capture product information and call Taobao data, you can use the data provided by Taobao api, if you only need to call Taobao product image names and other public information...</span> </div> <div class="art_confoot"></div></div><div class="nphpQianMsg"><div class="clear"></div></div><div class="nphpQianSheng"><span>Statement:</span><div>The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn</div></div></div><div class="nphpSytBox"><span>Previous article:<a class="dBlack" title="Add the latest comments to ECShop, add the latest comments to ECShop_PHP Tutorial" href="https://m.php.cn/faq/292407.html">Add the latest comments to ECShop, add the latest comments to ECShop_PHP Tutorial</a></span><span>Next article:<a class="dBlack" title="Add the latest comments to ECShop, add the latest comments to ECShop_PHP Tutorial" href="https://m.php.cn/faq/292409.html">Add the latest comments to ECShop, add the latest comments to ECShop_PHP Tutorial</a></span></div><div class="nphpSytBox2"><div class="nphpZbktTitle"><h2>Related articles</h2><em><a href="https://m.php.cn/article.html" class="bBlack"><i>See more</i><b></b></a></em><div class="clear"></div></div><ins class="adsbygoogle" style="display:block" data-ad-format="fluid" data-ad-layout-key="-6t+ed+2i-1n-4w" data-ad-client="ca-pub-5902227090019525" data-ad-slot="8966999616"></ins><script> (adsbygoogle = window.adsbygoogle || []).push({}); </script><ul class="nphpXgwzList"><li><b></b><a href="https://m.php.cn/faq/1.html" title="How to use cURL to implement Get and Post requests in PHP" class="aBlack">How to use cURL to implement Get and Post requests in PHP</a><div class="clear"></div></li><li><b></b><a href="https://m.php.cn/faq/1.html" title="How to use cURL to implement Get and Post requests in PHP" class="aBlack">How to use cURL to implement Get and Post requests in PHP</a><div class="clear"></div></li><li><b></b><a href="https://m.php.cn/faq/1.html" title="How to use cURL to implement Get and Post requests in PHP" class="aBlack">How to use cURL to implement Get and Post requests in PHP</a><div class="clear"></div></li><li><b></b><a href="https://m.php.cn/faq/1.html" title="How to use cURL to implement Get and Post requests in PHP" class="aBlack">How to use cURL to implement Get and Post requests in PHP</a><div class="clear"></div></li><li><b></b><a href="https://m.php.cn/faq/2.html" title="All expression symbols in regular expressions (summary)" class="aBlack">All expression symbols in regular expressions (summary)</a><div class="clear"></div></li></ul></div></div><ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-5902227090019525" data-ad-slot="5027754603"></ins><script> (adsbygoogle = window.adsbygoogle || []).push({}); </script><footer><div class="footer"><div class="footertop"><img src="/static/imghwm/logo.png" alt=""><p>Public welfare online PHP training,Help PHP learners grow quickly!</p></div><div class="footermid"><a href="https://m.php.cn/about/us.html">About us</a><a href="https://m.php.cn/about/disclaimer.html">Disclaimer</a><a href="https://m.php.cn/update/article_0_1.html">Sitemap</a></div><div class="footerbottom"><p> © php.cn All rights reserved </p></div></div></footer><script>isLogin = 0;</script><script type="text/javascript" src="/static/layui/layui.js"></script><script type="text/javascript" src="/static/js/global.js?4.9.47"></script></div><script src="https://vdse.bdstatic.com//search-video.v1.min.js"></script><link rel='stylesheet' id='_main-css' href='/static/css/viewer.min.css' type='text/css' media='all'/><script type='text/javascript' src='/static/js/viewer.min.js?1'></script><script type='text/javascript' src='/static/js/jquery-viewer.min.js'></script><script>jQuery.fn.wait = function (func, times, interval) { var _times = times || -1, //100次 _interval = interval || 20, //20毫秒每次 _self = this, _selector = this.selector, //选择器 _iIntervalID; //定时器id if( this.length ){ //如果已经获取到了,就直接执行函数 func && func.call(this); } else { _iIntervalID = setInterval(function() { if(!_times) { //是0就退出 clearInterval(_iIntervalID); } _times <= 0 || _times--; //如果是正数就 -- _self = $(_selector); //再次选择 if( _self.length ) { //判断是否取到 func && func.call(_self); clearInterval(_iIntervalID); } }, _interval); } return this; } $("table.syntaxhighlighter").wait(function() { $('table.syntaxhighlighter').append("<p class='cnblogs_code_footer'><span class='cnblogs_code_footer_icon'></span></p>"); }); $(document).on("click", ".cnblogs_code_footer",function(){ $(this).parents('table.syntaxhighlighter').css('display','inline-table');$(this).hide(); }); $('.nphpQianCont').viewer({navbar:true,title:false,toolbar:false,movable:false,viewed:function(){$('img').click(function(){$('.viewer-close').trigger('click');});}}); </script></body><!-- Matomo --><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="https://tongji.php.cn/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '9']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!-- End Matomo Code --></html>