python网络编程学习笔记(八)：XML生成与解析（DOM、ElementTree）-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

python网络编程学习笔记(八)：XML生成与解析（DOM、ElementTree）

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 16, 2016 am 08:43 AM

pythonxmlparse

xml.dom篇

DOM是Document Object Model的简称，XML 文档的高级树型表示。该模型并非只针对 Python，而是一种普通XML 模型。Python 的 DOM 包是基于 SAX 构建的，并且包括在 Python 2.0 的标准 XML 支持里。

一、xml.dom的简单介绍

１、主要方法：

minidom.parse(filename)：加载读取XML文件
doc.documentElement：获取XML文档对象
node.getAttribute(AttributeName)：获取XML节点属性值
node.getElementsByTagName(TagName)：获取XML节点对象集合
node.childNodes ：返回子节点列表。
node.childNodes[index].nodeValue：获取XML节点值
node.firstChild：访问第一个节点，等价于pagexml.childNodes[0]
返回Node节点的xml表示的文本：
doc = minidom.parse(filename)
doc.toxml('UTF-8')

访问元素属性：

Node.attributes["id"]
a.name #就是上面的 "id"
a.value #属性的值

２、举例说明

例１：文件名：book.xml

复制代码代码如下:

   Book message

        bookone
        python check
        001
        200

        booktwo
        python learn
        002
        300

（１）创建ＤＯＭ对象

复制代码代码如下:

import xml.dom.minidom
dom1=xml.dom.minidom.parse('book.xml')

（２）获取根字节

root=dom1.documentElement #这里得到的是根节点
print root.nodeName,',',root.nodeValue,',',root.nodeType

返回结果为：

info , None , 1

其中:

info是指根节点的名称root.nodeName
None是指根节点的值root.nodeValue

１是指根节点的类型root.nodeType,更多节点类型如下表：

NodeType	Named Constant
1	ELEMENT_NODE
2	ATTRIBUTE_NODE
3	TEXT_NODE
4	CDATA_SECTION_NODE
5	ENTITY_REFERENCE_NODE
6	ENTITY_NODE
7	PROCESSING_INSTRUCTION_NODE
8	COMMENT_NODE
9	DOCUMENT_NODE
10	DOCUMENT_TYPE_NODE
11	DOCUMENT_FRAGMENT_NODE
12	NOTATION_NODE

（3）子元素、子节点的访问

Ａ、返回root子节点列表

复制代码代码如下:

import xml.dom.minidom
dom1=xml.dom.minidom.parse('book.xml')
root=dom1.documentElement
#print root.nodeName,',',root.nodeValue,',',root.nodeType
print root.childNodes

运行结果为：

[, , , , , , ]

Ｂ、获取XML节点值，如返回根节点下第二个子节点intro的值和名字，添加下面一句

复制代码代码如下:

print root.childNodes[1].nodeName,root.childNodes[1].nodeValue

运行结果为：

intro None

Ｃ、访问第一个节点

复制代码代码如下:

print root.firstChild.nodeName

运行结果为：

#text

Ｄ、获取已经知道的元素名字的值，如要获取intro后的book message可以使用下面的方法：

复制代码代码如下:

import xml.dom.minidom
dom1=xml.dom.minidom.parse('book.xml')
root=dom1.documentElement
#print root.nodeName,',',root.nodeValue,',',root.nodeType
node= root.getElementsByTagName('intro')[0]
for node in node.childNodes:
if node.nodeType in (node.TEXT_NODE,node.CDATA_SECTION_NODE):
print node.data

这种方法的不足之处是需要对类型进行判断，使用起来不是很方便。运行结果是：

Book message

二、xml解析

对上面的xml进行解析

方法1 代码如下：

复制代码代码如下:

#@小五义 http://www.cnblogs.com/xiaowuyi
#xml 解析

运行结果为：

====================
id:001
head: bookone
name: python check
number: 001
page: 200
====================
id:002
head: booktwo
name: python learn
number: 002
page: 300

方法二：

代码：

复制代码代码如下:

#@小五义 http://www.cnblogs.com/xiaowuyi
#xml 解析

import xml.dom.minidom
dom1=xml.dom.minidom.parse('book.xml')
root=dom1.documentElement
book={}
booknode=root.getElementsByTagName('list')
for booklist in booknode:
    print '='*20
    print 'id:'+booklist.getAttribute('id')
    print 'head:'+booklist.getElementsByTagName('head')[0].childNodes[0].nodeValue.strip()
    print 'name:'+booklist.getElementsByTagName('name')[0].childNodes[0].nodeValue.strip()
    print 'number:'+booklist.getElementsByTagName('number')[0].childNodes[0].nodeValue.strip()
    print 'page:'+booklist.getElementsByTagName('page')[0].childNodes[0].nodeValue.strip()

运行结果与方法一一样。比较上面的两个方法，方法一根据xml的树结构进行了多次循环，可读性上不及方法二，方法直接对每一个节点进行操作，更加清晰。为了更加方法程序的调用，可以使用一个list加一个字典进行存储，具体见方法3：

复制代码代码如下:

#@小五义 http://www.cnblogs.com/xiaowuyi
#xml 解析
import xml.dom.minidom
dom1=xml.dom.minidom.parse('book.xml')
root=dom1.documentElement
book=[]
booknode=root.getElementsByTagName('list')
for booklist in booknode:
    bookdict={}
    bookdict['id']=booklist.getAttribute('id')
    bookdict['head']=booklist.getElementsByTagName('head')[0].childNodes[0].nodeValue.strip()
    bookdict['name']=booklist.getElementsByTagName('name')[0].childNodes[0].nodeValue.strip()
    bookdict['number']=booklist.getElementsByTagName('number')[0].childNodes[0].nodeValue.strip()
    bookdict['page']=booklist.getElementsByTagName('page')[0].childNodes[0].nodeValue.strip()
    book.append(bookdict)
print book

运行结果为：

[{'head': u'bookone', 'page': u'200', 'number': u'001', 'id': u'001', 'name': u'python check'}, {'head': u'booktwo', 'page': u'300', 'number': u'002', 'id': u'002', 'name': u'python learn'}]

该列表里包含了两个字典。

三、建立ＸＭＬ文件
这里用方法三得到的结果，建立一个xml文件。

复制代码代码如下:

# -*- coding: cp936 -*-
#@小五义 http://www.cnblogs.com/xiaowuyi
#xml 创建

import xml.dom
def create_element(doc,tag,attr):
    #创建一个元素节点
    elementNode=doc.createElement(tag)
    #创建一个文本节点
    textNode=doc.createTextNode(attr)
    #将文本节点作为元素节点的子节点
    elementNode.appendChild(textNode)
    return elementNode

dom1=xml.dom.getDOMImplementation()#创建文档对象，文档对象用于创建各种节点。
doc=dom1.createDocument(None,"info",None)
top_element = doc.documentElement# 得到根节点
books=[{'head': u'bookone', 'page': u'200', 'number': u'001', 'id': u'001', 'name': u'python check'}, {'head': u'booktwo', 'page': u'300', 'number': u'002', 'id': u'002', 'name': u'python learn'}]
for book in books:
    sNode=doc.createElement('list')
    sNode.setAttribute('id',str(book['id']))
    headNode=create_element(doc,'head',book['head'])
    nameNode=create_element(doc,'name',book['name'])
    numberNode=create_element(doc,'number',book['number'])
    pageNode=create_element(doc,'page',book['page'])
    sNode.appendChild(headNode)
    sNode.appendChild(nameNode)
    sNode.appendChild(pageNode)
    top_element.appendChild(sNode)# 将遍历的节点添加到根节点下
xmlfile=open('bookdate.xml','w')
doc.writexml(xmlfile,addindent=' '*4, newl='\n', encoding='utf-8')
xmlfile.close()

运行后生成bookdate.xml文件，该文件与book.xml一样。

xml.etree.ElementTree篇

依然使用例1的例子，对xml进行解析分析。

1、加载XML

方法一：直接加载文件

复制代码代码如下:

import xml.etree.ElementTree
root=xml.etree.ElementTree.parse('book.xml')

方法二：加载指定字符串

复制代码代码如下:

import xml.etree.ElementTree
root = xml.etree.ElementTree.fromstring(xmltext)这里xmltext是指定的字符串。

２、获取节点

方法一利用getiterator方法得到指定节点

book_node=root.getiterator("list")

方法二利用getchildren方法得到子节点,如例１中，要得到list下面子节点head的值：

复制代码代码如下:

#@小五义 http://www.cnblogs.com/xiaowuyiimport xml.etree.ElementTree
root=xml.etree.ElementTree.parse('book.xml')
book_node=root.getiterator("list")
for node in book_node:
book_node_child=node.getchildren()[0]
print book_node_child.tag+':'+book_node_child.text

运行结果为：

head:bookone
head:booktwo

方法三使用find和findall方法

find方法找到指定的第一个节点：

复制代码代码如下:

# -*- coding: cp936 -*-
#@小五义
import xml.etree.ElementTree
root=xml.etree.ElementTree.parse('book.xml')
book_find=root.find('list')
for note in book_find:
print note.tag+':'+note.text

运行结果：

head:bookone
name:python check
number:001
page:200

findall方法将找到指定的所有节点：

复制代码代码如下:

# -*- coding: cp936 -*-
#@小五义
import xml.etree.ElementTree
root=xml.etree.ElementTree.parse('book.xml')
book=root.findall('list')
for book_list in book:
for note in book_list:
print note.tag+':'+note.text

运行结果：

head:bookone
name:python check
number:001
page:200
head:booktwo
name:python learn
number:002
page:300

３、对book.xml进行解析的实例

复制代码代码如下:

# -*- coding: cp936 -*-
#@小五义
import xml.etree.ElementTree
root=xml.etree.ElementTree.parse('book.xml')
book=root.findall('list')
for book_list in book:
    print '='*20
    if book_list.attrib.has_key('id'):
        print "id:"+book_list.attrib['id']
    for note in book_list:
        print note.tag+':'+note.text
print '='*20

运行结果为：

====================
id:001
head:bookone
name:python check
number:001
page:200
====================
id:002
head:booktwo
name:python learn
number:002
page:300
====================

注意：
当要获取属性值时，如list id='001',用attrib方法。
当要获取节点值时，如

bookone中的bookone用text方法。
当要获取节点名时，用tag方法。

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

What data types can be stored in a Python array?Apr 27, 2025 am 12:11 AM

Pythonlistscanstoreanydatatype,arraymodulearraysstoreonetype,andNumPyarraysarefornumericalcomputations.1)Listsareversatilebutlessmemory-efficient.2)Arraymodulearraysarememory-efficientforhomogeneousdata.3)NumPyarraysareoptimizedforperformanceinscient

What happens if you try to store a value of the wrong data type in a Python array?Apr 27, 2025 am 12:10 AM

WhenyouattempttostoreavalueofthewrongdatatypeinaPythonarray,you'llencounteraTypeError.Thisisduetothearraymodule'sstricttypeenforcement,whichrequiresallelementstobeofthesametypeasspecifiedbythetypecode.Forperformancereasons,arraysaremoreefficientthanl

Which is part of the Python standard library: lists or arrays?Apr 27, 2025 am 12:03 AM

Pythonlistsarepartofthestandardlibrary,whilearraysarenot.Listsarebuilt-in,versatile,andusedforstoringcollections,whereasarraysareprovidedbythearraymoduleandlesscommonlyusedduetolimitedfunctionality.

What should you check if the script executes with the wrong Python version?Apr 27, 2025 am 12:01 AM

ThescriptisrunningwiththewrongPythonversionduetoincorrectdefaultinterpretersettings.Tofixthis:1)CheckthedefaultPythonversionusingpython--versionorpython3--version.2)Usevirtualenvironmentsbycreatingonewithpython3.9-mvenvmyenv,activatingit,andverifying

What are some common operations that can be performed on Python arrays?Apr 26, 2025 am 12:22 AM

Pythonarrayssupportvariousoperations:1)Slicingextractssubsets,2)Appending/Extendingaddselements,3)Insertingplaceselementsatspecificpositions,4)Removingdeleteselements,5)Sorting/Reversingchangesorder,and6)Listcomprehensionscreatenewlistsbasedonexistin

In what types of applications are NumPy arrays commonly used?Apr 26, 2025 am 12:13 AM

NumPyarraysareessentialforapplicationsrequiringefficientnumericalcomputationsanddatamanipulation.Theyarecrucialindatascience,machinelearning,physics,engineering,andfinanceduetotheirabilitytohandlelarge-scaledataefficiently.Forexample,infinancialanaly

When would you choose to use an array over a list in Python?Apr 26, 2025 am 12:12 AM

Useanarray.arrayoveralistinPythonwhendealingwithhomogeneousdata,performance-criticalcode,orinterfacingwithCcode.1)HomogeneousData:Arrayssavememorywithtypedelements.2)Performance-CriticalCode:Arraysofferbetterperformancefornumericaloperations.3)Interf

Are all list operations supported by arrays, and vice versa? Why or why not?Apr 26, 2025 am 12:05 AM

No,notalllistoperationsaresupportedbyarrays,andviceversa.1)Arraysdonotsupportdynamicoperationslikeappendorinsertwithoutresizing,whichimpactsperformance.2)Listsdonotguaranteeconstanttimecomplexityfordirectaccesslikearraysdo.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Hot Topics

Where is the login entrance for gmail email?

7751

1643

1397

1293

1234