Home  >  Article  >  Backend Development  >  How should I write the regular expression for matching within a region?

How should I write the regular expression for matching within a region?

WBOY
WBOYOriginal
2016-07-06 13:54:121242browse

<code>有一个$link="url",打开后代码如下:
<body>
    ......
     <div class="news_list">
            <ul>
             <li>
              <span>2016-06-06</span>
              <a href="/news!show.action?id=f435345c44e04ec3a5e6ccedca29e061">羊山新区2016年14条道路绿化工程招标公告</a>
              </li>
            <li>
              <span>2016-06-06</span>
              <a href="/news!show.action?id=ad4b065149d94704b3d295287f863b5a">平桥区明港镇井庄路口-垃圾处理场-何岗村南路口(K0+000-K4+300)公路改建工程施工招标公告</a>
              </li>
           <li>
              <span>2016-05-31</span>
              <a href="/news!show.action?id=c3b688ae2ec54fb0880a0f60f7a4f5f0">信阳市中心医院羊山分院人防工程监理招标公告</a>
              </li>
            <li>
              <span>2016-05-31</span>
              <a href="/news!show.action?id=2a7060f3519b40b3aa766dd53f2b00ad">信阳市儿童医院病房楼建设项目施工及监理项目招标公告</a>
              </li>
            </ul>
        </div>
        <!--分页-->
        <div class="page_num">
        ......
</body>
</code>
<code>目标:获得<div class="news_list">......<div class="page_num">中href标签里的内容"/news!show.action?id=2a7060f3519b40b3aa766dd53f2b00ad"

我写的代码如下:
//获取链接中的内容
$htmlContent=file_get_contents("$link");
//获取链接
$num=preg_match_all("/<div.*?class=\"news_list\">.*?(href=\".*?\").*<div.*class=\"page_num\">/is",$htmlContent,$array);
//打印数组
var_dump($array[1]);

目前卡顿的地方:在  //获取链接  处,子表达式只能获取一个结果,而且不是想要的href标签中的内容。</code>

Reply content:

<code>有一个$link="url",打开后代码如下:
<body>
    ......
     <div class="news_list">
            <ul>
             <li>
              <span>2016-06-06</span>
              <a href="/news!show.action?id=f435345c44e04ec3a5e6ccedca29e061">羊山新区2016年14条道路绿化工程招标公告</a>
              </li>
            <li>
              <span>2016-06-06</span>
              <a href="/news!show.action?id=ad4b065149d94704b3d295287f863b5a">平桥区明港镇井庄路口-垃圾处理场-何岗村南路口(K0+000-K4+300)公路改建工程施工招标公告</a>
              </li>
           <li>
              <span>2016-05-31</span>
              <a href="/news!show.action?id=c3b688ae2ec54fb0880a0f60f7a4f5f0">信阳市中心医院羊山分院人防工程监理招标公告</a>
              </li>
            <li>
              <span>2016-05-31</span>
              <a href="/news!show.action?id=2a7060f3519b40b3aa766dd53f2b00ad">信阳市儿童医院病房楼建设项目施工及监理项目招标公告</a>
              </li>
            </ul>
        </div>
        <!--分页-->
        <div class="page_num">
        ......
</body>
</code>
<code>目标:获得<div class="news_list">......<div class="page_num">中href标签里的内容"/news!show.action?id=2a7060f3519b40b3aa766dd53f2b00ad"

我写的代码如下:
//获取链接中的内容
$htmlContent=file_get_contents("$link");
//获取链接
$num=preg_match_all("/<div.*?class=\"news_list\">.*?(href=\".*?\").*<div.*class=\"page_num\">/is",$htmlContent,$array);
//打印数组
var_dump($array[1]);

目前卡顿的地方:在  //获取链接  处,子表达式只能获取一个结果,而且不是想要的href标签中的内容。</code>

Looking around should be able to meet your current needs

<code class="php">preg_match_all('/(?<=<span>\d{4}-\d{2}-\d{2}<\/span>)(?:.*?href=\")(.*?)(?:\".*?)(?=<\/li>)/is', $a, $matches);</code>
<code>array(2) {
  [0]=>
  array(4) {
    [0]=>
    string(152) "
              <a href="/news!show.action?id=f435345c44e04ec3a5e6ccedca29e061">羊山新区2016年14条道路绿化工程招标公告</a>
              "
    [1]=>
    string(218) "
              <a href="/news!show.action?id=ad4b065149d94704b3d295287f863b5a">平桥区明港镇井庄路口-垃圾处理场-何岗村南路口(K0+000-K4+300)公路改建工程施工招标公告</a>
              "
    [2]=>
    string(161) "
              <a href="/news!show.action?id=c3b688ae2ec54fb0880a0f60f7a4f5f0">信阳市中心医院羊山分院人防工程监理招标公告</a>
              "
    [3]=>
    string(173) "
              <a href="/news!show.action?id=2a7060f3519b40b3aa766dd53f2b00ad">信阳市儿童医院病房楼建设项目施工及监理项目招标公告</a>
              "
  }
  [1]=>
  array(4) {
    [0]=>
    string(53) "/news!show.action?id=f435345c44e04ec3a5e6ccedca29e061"
    [1]=>
    string(53) "/news!show.action?id=ad4b065149d94704b3d295287f863b5a"
    [2]=>
    string(53) "/news!show.action?id=c3b688ae2ec54fb0880a0f60f7a4f5f0"
    [3]=>
    string(53) "/news!show.action?id=2a7060f3519b40b3aa766dd53f2b00ad"
  }
}</code>

Reason: There is only one result of match, and of course the sub-expression also has only one result.

If changed to:

<code>$num = preg_match_all("/.*?(href=\".*?\").*?/is", $htmlContent, $array);</code>

There are four match results, so there are 4 matches..

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn