There are many techniques for reading and writing XML in PHP. This article provides three ways to read XML: using a DOM library, using a SAX parser, and using regular expressions. Writing XML using DOM and PHP text templates is also covered.
Reading and writing Extensible Markup Language (XML) with PHP may seem a little scary. In fact, XML and all its related technologies can be scary, but reading and writing XML in PHP doesn't have to be a scary task. First, you need to learn a little bit about XML—what it is and what you can do with it. Then, you need to learn how to read and write XML in PHP, and there are many ways to do this.
This article provides a brief introduction to XML and then explains how to read and write XML with PHP.
What is XML?
XML is a data storage format. It does not define what data is saved, nor does it define the format of the data. XML simply defines tags and the attributes of those tags. Well-formed XML markup looks like this:
<name>Jack Herrington</name>
This tag contains some text: Jack Herrington.
An XML tag that does not contain text looks like this:
<powerUp />
There is more than one way to write something in XML. For example, this tag forms the same output as the previous tag:
<powerUp></powerUp>
You can also add attributes to XML tags. For example, this tag contains first and last attributes:
<name first="Jack" last="Herrington" />
You can also use XML to encode special characters. For example, the & symbol can be encoded like this:
&
An XML file containing tags and attributes, if formatted like the example, is well-formed, meaning the tags are symmetrical and the characters are encoded correctly. Listing 1 is an example of well-formed XML.
Listing 1. XML book list example
<books>
<book>
<author>Jack Herrington</author>
<title>PHP Hacks</title>
<publisher>OReilly</publisher>
</book>
<book>
<author>Jack Herrington</author>
<title>Podcasting Hacks</title>
<publisher>OReilly</publisher>
</book>
</books>
|
The XML in Listing 1 contains a list of books. The parent tag contains a set of tags, each of which contains , , and tags.
An XML document is correct when its markup structure and content are verified by an external schema file. Schema files can be specified in different formats. For the purposes of this article, all that is needed is well-formed XML.
If you think XML looks a lot like Hypertext Markup Language (HTML), you're right. XML and HTML are both markup-based languages and they have many similarities. However, it is important to note that while an XML document may be well-formed HTML, not all HTML documents are well-formed XML. The newline tag (br) is a good example of the difference between XML and HTML. This newline tag is well-formed HTML, but not well-formed XML:
<p>This is a paragraph<br>
With a line break</p>
This newline tag is well-formed XML and HTML:
<p>This is a paragraph<br />
With a line break</p>
If you want to write HTML as well-formed XML, follow the W3C committee's Extensible Hypertext Markup Language (XHTML) standard (see Resources). All modern browsers can render XHTML. Furthermore, you can use XML tools to read XHTML and find the data in the document, which is much easier than parsing HTML.
Use DOM library to read XML
The easiest way to read well-formed XML files is to use the Document Object Model (DOM) library compiled into some PHP installations. The DOM library reads the entire XML document into memory and represents it as a node tree, as shown in Figure 1.
Figure 1. XML DOM tree of book XML
The books node at the top of the tree has two book child tags. In each book, there are several nodes: author, publisher and title. The author, publisher, and title nodes each have text child nodes that contain text.
The code to read the book XML file and display the content using the DOM is shown in Listing 2.
List 2. Reading book XML using DOM
<?php
$doc = new DOMDocument();
$doc->load( books.xml );
$books = $doc->getElementsByTagName( "book" );
foreach( $books as $book )
{
$authors = $book->getElementsByTagName( "author" );
$author = $authors->item(0)->nodeValue;
$publishers = $book->getElementsByTagName( "publisher" );
$publisher = $publishers->item(0)->nodeValue;
$titles = $book->getElementsByTagName( "title" );
$title = $titles->item(0)->nodeValue;
echo "$title - $author - $publisher
";
}
?>
|
脚本首先创建一个 new DOMdocument 对象,用 load 方法把图书 XML 装入这个对象。之后,脚本用 getElementsByName 方法得到指定名称下的所有元素的列表。
在 book 节点的循环中,脚本用 getElementsByName 方法获得 author、publisher 和 title 标记的 nodeValue。nodeValue 是节点中的文本。脚本然后显示这些值。
可以在命令行上像这样运行 PHP 脚本:
% php e1.php
PHP Hacks - Jack Herrington - OReilly
Podcasting Hacks - Jack Herrington - OReilly
%
可以看到,每个图书块输出一行。这是一个良好的开始。但是,如果不能访问 XML DOM 库该怎么办?
用 SAX 解析器读取 XML
读取 XML 的另一种方法是使用 XML Simple API(SAX)解析器。PHP 的大多数安装都包含 SAX 解析器。SAX 解析器运行在回调模型上。每次打开或关闭一个标记时,或者每次解析器看到文本时,就用节点或文本的信息回调用户定义的函数。
SAX 解析器的优点是,它是真正轻量级的。解析器不会在内存中长期保持内容,所以可以用于非常巨大的文件。缺点是编写 SAX 解析器回调是件非常麻烦的事。清单 3 显示了使用 SAX 读取图书 XML 文件并显示内容的代码。
清单 3. 用 SAX 解析器读取图书 XML
<?php
$g_books = array();
$g_elem = null;
function startElement( $parser, $name, $attrs )
{
global $g_books, $g_elem;
if ( $name == BOOK ) $g_books []= array();
$g_elem = $name;
}
function endElement( $parser, $name )
{
global $g_elem;
$g_elem = null;
}
function textData( $parser, $text )
{
global $g_books, $g_elem;
if ( $g_elem == AUTHOR ||
$g_elem == PUBLISHER ||
$g_elem == TITLE )
{
$g_books[ count( $g_books ) - 1 ][ $g_elem ] = $text;
}
}
$parser = xml_parser_create();
xml_set_element_handler( $parser, "startElement", "endElement" );
xml_set_character_data_handler( $parser, "textData" );
$f = fopen( books.xml, r );
while( $data = fread( $f, 4096 ) )
{
xml_parse( $parser, $data );
}
xml_parser_free( $parser );
foreach( $g_books as $book )
{
echo $book[TITLE]." - ".$book[AUTHOR]." - ";
echo $book[PUBLISHER]."
";
}
?>
|
脚本首先设置 g_books 数组,它在内存中容纳所有图书和图书信息,g_elem 变量保存脚本目前正在处理的
http://www.bkjia.com/PHPjc/508477.htmlwww.bkjia.comtruehttp://www.bkjia.com/PHPjc/508477.htmlTechArticle有许多技术可用于用 PHP 读取和编写 XML。本文提供了三种方法读取 XML:使用 DOM 库、使用 SAX 解析器和使用正则表达式。还介绍了使用 DOM 和...
Statement:The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn