search

Home  >  Q&A  >  body text

Regular Expression - How to get the content of an HTML element in PHP

At present, I implement it through regular rules copied from the Internet, but it cannot achieve the effect I want.

My current plan is:

 $text=file_get_contents('404.html');
 preg_match('/<time[^>]*itemprop=\"datePublished\".*?>.*?<\/time>/ism',$text,$match);
print($match[0]); 

But the final output content is

<time datetime="2017-02-20T18:41:00+08:00" itemprop="datePublished">February 20, 2017</time>

What I want is to output February 20, 2017, which is the content of the reason, but I don’t understand the regular rules and I am completely confused after reading the encyclopedia. How to achieve this? Or how to write the regular expression to output the content

为情所困为情所困2864 days ago568

reply all(3)I'll reply

  • 仅有的幸福

    仅有的幸福2017-05-16 13:08:41

    This time is similar to html tags. You can use PHP's html tag removal function to remove the nested tags and it will be fine. Take a look at the code below and try using the tag removal function strip_tags(). For more specific usage, you can check the manual.

    $text=file_get_contents('404.html');
     preg_match('/<time[^>]*itemprop=\"datePublished\".*?>.*?<\/time>/ism',$text,$match); 
    print(strip_tags($match[0])); 

    reply
    0
  • 阿神

    阿神2017-05-16 13:08:41

    1. https://github.com/bupt1987/h...
    2. https://github.com/paquettg/p...

    We directly recommend two php to parse html, similar to jQuery. Users can read html elements

    reply
    0
  • PHPz

    PHPz2017-05-16 13:08:41

    strip_tags is a function of php, used to remove html tags from strings, so you can use strip_tags here. Since you use regular expressions, you can also use regular expressions directly to optimize the program. Please see the rules below

    preg_match('/<time[^>]*itemprop=\"datePublished\".*?>(.*?)<\/time>/ism',$text,$match);
    print_r($match); 

    reply
    0
  • Cancelreply