Heim  >  Artikel  >  Backend-Entwicklung  >  采集正则 求大神解答

采集正则 求大神解答

WBOY
WBOYOriginal
2016-06-23 13:41:091120Durchsuche

<h4 class="cat-hd fst-cat-hd ">                                                                            <i class="cat-icon fst-cat-icon  active-trigger"></i>                                        <a class="cat-name fst-cat-name"                                                                                       href="http://bosidengny.tmall.com/category-907362758.htm?search=y&catName=%D0%C2%C6%B7%D7%A8%C7%F8"                                             >新品专区</a>                                                                        </h4>                                                                    </li>                                                            <li class="cat fst-cat">                                    <h4 class="cat-hd fst-cat-hd has-children">                                                                            <i class="cat-icon fst-cat-icon  active-trigger"></i>                                        <a class="cat-name fst-cat-name"                                                                                       href="http://bosidengny.tmall.com/category-907362759.htm?search=y&catName=%B1%A3%C5%AF%C9%CF%D7%B0"                                             >保暖上装</a>                                                                        </h4>                                                                            <div class="snd-pop">                                            <div class="snd-pop-inner">                                                <ul class="fst-cat-bd">                                                                                                            <li class="cat snd-cat">                                                            <h4 class="cat-hd snd-cat-hd">                                                                <i class="cat-icon snd-cat-icon"></i>                                                                <a class="cat-name snd-cat-name"                                                                                                                                       href="http://bosidengny.tmall.com/category-907362760.htm?search=y&parentCatId=907362759&parentCatName=%B1%A3%C5%AF%C9%CF%D7%B0&catName=%BC%D9%C1%BD%BC%FE%A3%A8%B3%C4%C9%C0%C1%EC%A3%A9"                                                                     >                                                                    假两件(衬衫领)                                                                </a>                                                            </h4>                                                        </li>                                                                                                            <li class="cat snd-cat">                                                            <h4 class="cat-hd snd-cat-hd">                                                                <i class="cat-icon snd-cat-icon"></i>                                                                <a class="cat-name snd-cat-name"                                                                                                                                       href="http://bosidengny.tmall.com/category-907362761.htm?search=y&parentCatId=907362759&parentCatName=%B1%A3%C5%AF%C9%CF%D7%B0&catName=V%C1%EC%C9%CF%D7%B0"                                                                     >                                                                    V领上装                                                                </a>                                                            </h4>                                                        </li>


上面是分类  我匹配的一级分类
$dafht='#

(.*)

#iUs';
preg_match_all($dafht, $fenlei, $dafenlei);

但是无效 求大神解答  


一级分类主要是采集   category-907362761.htm     这个数字ID  和后面的名称

二级分类采集   category-907362761.htm    parentCatId=907362759  这俩个数字ID  和后面的名称

怎么写 卡主半天了 求解答








回复讨论(解决方案)

一级分类:/

.*?/isU,没测试,不知道能不能匹配到

二级分类看不太清楚

为什么费神写这个?人家网站稍有变化,功夫就白费了
网上有很多简捷实用的工具,为什么不用呢?
比如这个

$s =<<< TXT<h4 class="cat-hd fst-cat-hd ">                                                                             <i class="cat-icon fst-cat-icon  active-trigger"></i>                                        <a class="cat-name fst-cat-name"                                                                                        href="http://bosidengny.tmall.com/category-907362758.htm?search=y&catName=%D0%C2%C6%B7%D7%A8%C7%F8"                                             >新品专区</a>                                                                        </h4>                                                                    </li>                                                            <li class="cat fst-cat">                                    <h4 class="cat-hd fst-cat-hd has-children">                                                                             <i class="cat-icon fst-cat-icon  active-trigger"></i>                                        <a class="cat-name fst-cat-name"                                                                                        href="http://bosidengny.tmall.com/category-907362759.htm?search=y&catName=%B1%A3%C5%AF%C9%CF%D7%B0"                                             >保暖上装</a>                                                                        </h4>                                                                            <div class="snd-pop">                                            <div class="snd-pop-inner">                                                <ul class="fst-cat-bd">                                                                                                            <li class="cat snd-cat">                                                            <h4 class="cat-hd snd-cat-hd">                                                                <i class="cat-icon snd-cat-icon"></i>                                                                <a class="cat-name snd-cat-name"                                                                                                                                       href="http://bosidengny.tmall.com/category-907362760.htm?search=y&parentCatId=907362759&parentCatName=%B1%A3%C5%AF%C9%CF%D7%B0&catName=%BC%D9%C1%BD%BC%FE%A3%A8%B3%C4%C9%C0%C1%EC%A3%A9"                                                                     >                                                                    假两件(衬衫领)                                                                </a>                                                            </h4>                                                        </li>                                                                                                            <li class="cat snd-cat">                                                            <h4 class="cat-hd snd-cat-hd">                                                                <i class="cat-icon snd-cat-icon"></i>                                                                <a class="cat-name snd-cat-name"                                                                                                                                       href="http://bosidengny.tmall.com/category-907362761.htm?search=y&parentCatId=907362759&parentCatName=%B1%A3%C5%AF%C9%CF%D7%B0&catName=V%C1%EC%C9%CF%D7%B0"                                                                     >                                                                    V领上装                                                                </a>                                                            </h4>                                                        </li>TXT;include 'simple_html_dom.php';$p = new simple_html_dom;$p->load($s);foreach($p->find('a') as $v) {  echo $v->class, PHP_EOL; //这是可供区分级别的 class  echo $v->href,PHP_EOL; //这是url  echo trim($v->innertext()),PHP_EOL; //这是说明文字}
cat-name fst-cat-name http://bosidengny.tmall.com/category-907362758.htm?search=y&catName=%D0%C2%C6%B7%D7%A8%C7%F8新品专区 cat-name fst-cat-namehttp://bosidengny.tmall.com/category-907362759.htm?search=y&catName=%B1%A3%C5%AF%C9%CF%D7%B0保暖上装cat-name snd-cat-namehttp://bosidengny.tmall.com/category-907362760.htm?search=y&parentCatId=907362759&parentCatName=%B1%A3%C5%AF%C9%CF%D7%B0&catName=%BC%D9%C1%BD%BC%FE%A3%A8%B3%C4%C9%C0%C1%EC%A3%A9假两件(衬衫领)cat-name snd-cat-namehttp://bosidengny.tmall.com/category-907362761.htm?search=y&parentCatId=907362759&parentCatName=%B1%A3%C5%AF%C9%CF%D7%B0&catName=V%C1%EC%C9%CF%D7%B0V领上装

Stellungnahme:
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn