Maison > Article > développement back-end > php采集问题,遇到代码中的多个空格和换行怎么处理?
源代码是:
<code><a class="figure figure-180236 " data-qidanadd-albumid="543438400" data-qidanadd-episode="0" data-qidanadd-channelid="1" data-qidanadd-tvid="543438400" data-qidanadd-vip="0" data-widget-qidanadd="qidanadd" data-widget-block="block" data-block-type="qs1404043" data-searchpingback-elem="link" data-searchpingback-param="ptype=1-1" href="http://www.iqiyi.com/v_19rr9g9wks.html?fc=87451bff3f7d2f4a#vfrm=2-3-0-1" data-playsrc-linktype="play" data-playsrc-elem="pic" data-pb="rtgt=iqiyi&p2=9000" target="_blank"></a></code>
采集php代码是:
<code>preg_match("#<a class='\"figure' figure-180236 data-qidanadd-albumid='\"543438400\"' data-qidanadd-episode='\"0\"' data-qidanadd-channelid='\"1\"' data-qidanadd-tvid='\"543438400\"' data-qidanadd-vip='\"0\"' data-widget-qidanadd='\"qidanadd\"' data-widget-block='\"block\"' data-block-type='\"qs1404043\"' data-searchpingback-elem='\"link\"' data-searchpingback-param='\"ptype=1-1\"' href="%5C%22(.*?)%5C%22" data-playsrc-linktype='\"play\"' data-playsrc-elem='\"pic\"' data-pb='\"rtgt=iqiyi&p2=9000\"' target='\"_blank\"'>#",$content,$array);</a></code>
但是这样的话采集不到信息?求解
源代码是:
<code><a class="figure figure-180236 " data-qidanadd-albumid="543438400" data-qidanadd-episode="0" data-qidanadd-channelid="1" data-qidanadd-tvid="543438400" data-qidanadd-vip="0" data-widget-qidanadd="qidanadd" data-widget-block="block" data-block-type="qs1404043" data-searchpingback-elem="link" data-searchpingback-param="ptype=1-1" href="http://www.iqiyi.com/v_19rr9g9wks.html?fc=87451bff3f7d2f4a#vfrm=2-3-0-1" data-playsrc-linktype="play" data-playsrc-elem="pic" data-pb="rtgt=iqiyi&p2=9000" target="_blank"></a></code>
采集php代码是:
<code>preg_match("#<a class='\"figure' figure-180236 data-qidanadd-albumid='\"543438400\"' data-qidanadd-episode='\"0\"' data-qidanadd-channelid='\"1\"' data-qidanadd-tvid='\"543438400\"' data-qidanadd-vip='\"0\"' data-widget-qidanadd='\"qidanadd\"' data-widget-block='\"block\"' data-block-type='\"qs1404043\"' data-searchpingback-elem='\"link\"' data-searchpingback-param='\"ptype=1-1\"' href="%5C%22(.*?)%5C%22" data-playsrc-linktype='\"play\"' data-playsrc-elem='\"pic\"' data-pb='\"rtgt=iqiyi&p2=9000\"' target='\"_blank\"'>#",$content,$array);</a></code>
但是这样的话采集不到信息?求解
请好好学习一下PHP的正则表达式模式,其中有一个m是指多行匹配可以满足你的需求,http://php.net/manual/en/refe...。
建议html解析使用DiDom
类似于jquery的dom选择
说到底就是想得到超链接的href链接地址对吧,你的正则写的太多了,写的越多越难匹配到,另外多行匹配就像楼上说的有专门的模式匹配符,我一般用的是s