Home  >  Article  >  Backend Development  >  基于simple_html_dom的使用小结_php技巧

基于simple_html_dom的使用小结_php技巧

WBOY
WBOYOriginal
2016-05-17 08:57:251027browse

复制代码 代码如下:

简单范例
$html = file_get_html('http://www.google.com/');  //获取html$dom = new simple_html_dom();    //new simple_html_dom对象$dom->load($html)      //加载html// Find all images foreach($dom->find('img') as $element) {   //获取img标签数组       echo $element->src . '
';    //获取每个img标签中的src}// Find all links foreach($dom->find('a') as $element){    //获取a标签的数组       echo $element->href . '
';    //获取每个a标签中的href}


$html = file_get_html('http://slashdot.org/');   //获取html$dom = new simple_html_dom();    //new simple_html_dom对象$dom->load($html);     //加载html// Find all article blocksforeach($dom->find('div.article') as $article) {       $item['title']     = $article->find('div.title', 0)->plaintext; //plaintext 获取纯文本    $item['intro']    = $article->find('div.intro', 0)->plaintext;    $item['details'] = $article->find('div.details', 0)->plaintext;    $articles[] = $item;}print_r($articles);

}


// Create DOM from string

$html = str_get_html('

Hello
World
');
$dom = new simple_html_dom();     //new simple_html_dom对象

$dom->load($html);      //加载html
$dom->find('div', 1)->class = 'bar';    //class = 赋值 给第二个div的class赋值

$dom->find('div[id=hello]', 0)->innertext = 'foo';   //innertext内部文本

echo $dom;

// Output:

foo
World

 

DOM methods & properties
Name Description
void __construct ( [string $filename] ) 构造函数,将文件名参数将自动加载内容,无论是文本或文件/ url。
 string plaintext 纯文本
void clear () 清理内存
void load ( string $content ) 加载内容
string save ( [string $filename] ) Dumps the internal DOM tree back into a string. If the $filename is set, result string will save to file.
void load_file ( string $filename ) Load contents from a from a file or a URL.
void set_callback ( string $function_name ) 设置一个回调函数。
mixed find ( string $selector [, int $index] ) 找到元素的CSS选择器。返回第n个元素对象如果索引设置,否则返回一个数组对象。


 4.find 方法详细介绍


find ( string $selector [, int $index] )
// Find all anchors, returns a array of element objects a标签数组
$ret = $html->find('a');

// Find (N)th anchor, returns element object or null if not found (zero based)第一个a标签
$ret = $html->find('a', 0);

// Find lastest anchor, returns element object or null if not found (zero based)最后一个a标签
$ret = $html->find('a', -1);

// Find all

with the id attribute
$ret = $html->find('div[id]');

// Find all

which attribute id=foo
$ret = $html->find('div[id=foo]');


// Find all element which id=foo
$ret = $html->find('#foo');

// Find all element which class=foo
$ret = $html->find('.foo');

// Find all element has attribute id
$ret = $html->find('*[id]');

// Find all anchors and images a标签与img标签数组
$ret = $html->find('a, img'); 

// Find all anchors and images with the "title" attribute
$ret = $html->find('a[title], img[title]');


// Find all

  • in

      $es = $html->find('ul li'); ul标签下的li标签数组

      // Find Nested

      tags
      $es = $html->find('div div div');  div标签下div标签下div标签数组

      // Find all

      in which class=hello
      $es = $html->find('table.hello td'); table标签下td标签数组

      // Find all td tags with attribite align=center in table tags
      $es = $html->find(''table td[align=center]');

       5.Element  的方法
      $e = $html->find("div", 0);                              //$e 所拥有的方法如下表所示
      Attribute Name Usage
      $e->tag 标签
      $e->outertext 外文本
      $e->innertext 内文本
      $e->plaintext 纯文本

       

      // Example
      $html = str_get_html("

      foo bar
      ");
      echo $e->tag; // Returns: " div"
      echo $e->outertext; // Returns: "
      foo bar
      "
      echo $e->innertext; // Returns: " foo bar"
      echo $e->plaintext; // Returns: " foo bar"

      6.DOM traversing 方法
      Method Description
      mixed$e->children ( [int $index] ) 子元素
      element$e->parent () 父元素
      element$e->first_child () 第一个子元素
      element$e->last_child () 最后一个子元素
      element$e->next_sibling () 后一个兄弟元素
      element$e->prev_sibling () 前一个兄弟元素


      // Example
      echo $html->find("#div1", 0)->children(1)->children(1)->children(2)->id;
      // or
      echo $html->getElementById("div1")->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');


  • Statement:
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn