Detailed introduction to DOM parsing in XML parsing-XML/RSS Tutorial-php.cn

Home

Backend Development

XML/RSS Tutorial

Detailed introduction to DOM parsing in XML parsing

王林

Aug 26, 2019 pm 05:30 PM

dom

1. Concept

xml files are mostly used to describe information, so after obtaining an xml document, extracting the corresponding information according to the elements in the xml is xml parsing. There are two ways to parse Xml, one is DOM parsing and the other is SAX parsing. The two operation methods are as shown in the figure.

Detailed introduction to DOM parsing in XML parsing

2. DOM parsing

The xml parser based on DOM parsing converts it into a collection of object models, using A tree is a data structure that stores information. Through the DOM interface, the application can access any part of the data in the xml document at any time, so this method of using the DOM interface to access is also called random access.

This method also has flaws, because the DOM analyzer converts the entire xml file into a tree and stores it in memory. When the file structure is large or the data is complex, this method has higher memory requirements. , and traversing a tree with a complex structure is also a very time-consuming operation. However, the tree structure used by DOM is consistent with the way xml stores information, and its random access can also be used, so the DOM interface still has widespread use value.

Here we give an example to illustrate the data structure of converting xml into a tree.

<?xml version="1.0" encoding="GBK"?>
<address>
	<linkman>
		<name>Van_DarkHolme</name>
		<email>van_darkholme@163.com</email>
	</linkman>
	<linkman>
		<name>Bili</name>
		<email>Bili@163.com</email>
	</linkman>
</address>

The structure of converting the xml into a tree is:

Detailed introduction to DOM parsing in XML parsing

There are the following 4 core operation interfaces in DOM parsing

Document: This interface represents the entire xml document and is represented as the root of the entire DOM, which is the entrance to the tree. Through this interface, the contents of all elements in the xml can be accessed. The common methods are as follows.

(Note: Although not shown in the above figure, the attributes of name and email are also one node respectively)

Common methods of Document

Detailed introduction to DOM parsing in XML parsing

Node: This interface plays an important role in the entire DOM tree. The core interfaces of DOM operations are inherited from Node (Document, Element, Attr). In the DOM tree, each Node interface represents a DOM tree node

Common methods of Node interface

Detailed introduction to DOM parsing in XML parsing

NodeList: This interface represents a collection of points. Generally used for a set of nodes in an ordered relationship.

NodeList common methods

Detailed introduction to DOM parsing in XML parsing

##NamedNodeMap: This interface represents the one-to-one relationship between a group of nodes and their unique names, and is mainly used to represent node attributes

In addition to the above four core interfaces, if a program needs to perform DOM parsing operations, it needs to follow the following steps:

1. Establish a DocumentBuilderFactor to obtain the DocumentBuilder object:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

2. Create DocumentBuidler:

DocumentBuilder builder = factory.newDocumentBuilder();

3. Create Document object and obtain Entry of the tree:

Document doc = builder.parse("relative path or absolute path of the xml file");

4. Create NodeList:

NodeList n1 = doc .getElementByTagName("Read Node");

5. Get xml information

public class DOMDemo01 {
	
	public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException{
		//建立DocumentBuilderFactor，用于获得DocumentBuilder对象：
		DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
		//2.建立DocumentBuidler：
		DocumentBuilder builder = factory.newDocumentBuilder();
		//3.建立Document对象，获取树的入口：
		Document doc = builder.parse("src//dom_demo_02.xml");
		//4.建立NodeList：
		NodeList node = doc.getElementsByTagName("linkman");
		//5.进行xml信息获取
		for(int i=0;i<node.getLength();i++){
			Element e = (Element)node.item(i);
			System.out.println("姓名："+
					e.getElementsByTagName("name").item(0).getFirstChild().getNodeValue());
			System.out.println("邮箱："+
					e.getElementsByTagName("email").item(0).getFirstChild().getNodeValue());
		}	
		
	}
}

Detailed introduction to DOM parsing in XML parsing

The above code will be analyzed from the fourth point:

Through doc.getElementByTagName("linkman") we obtain a NodeList. The above xml file contains two linkman nodes, so the NodeList here contains two Node (both linkman nodes), and then through the loop method to obtain the information in the xml file.

Element e = (Element)node.item(i) obtains the linkman node, that is, e points to the linkman

e.getElementTagName("name").item(0).getFirstChild ().getNodeValue();

getElementTagName("name"); Obtained all name nodes under the linkman (actually only 1);

Item(0); Take the first Name nodes (just one);

getFristChild(); Get the text node under the name node, which is the node where the content van is located (as mentioned above, the text content is also a separate node, createTextNode() in the Document method list is to create the text node);

getNodeValue() gets the value of the text node: van_darkholme;

For more related questions, please visit the PHP Chinese website: XML video tutorial

The above is the detailed content of Detailed introduction to DOM parsing in XML parsing. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:CSDN. If there is any infringement, please contact admin@php.cn delete

Building Feeds with XML: A Hands-On Guide to RSSApr 14, 2025 am 12:17 AM

The steps to build an RSSfeed using XML are as follows: 1. Create the root element and set the version; 2. Add the channel element and its basic information; 3. Add the entry element, including the title, link and description; 4. Convert the XML structure to a string and output it. With these steps, you can create a valid RSSfeed from scratch and enhance its functionality by adding additional elements such as release date and author information.

Creating RSS Documents: A Step-by-Step TutorialApr 13, 2025 am 12:10 AM

The steps to create an RSS document are as follows: 1. Write in XML format, with the root element, including the elements. 2. Add, etc. elements to describe channel information. 3. Add elements, each representing a content entry, including,,,,,,,,,,,. 4. Optionally add and elements to enrich the content. 5. Ensure the XML format is correct, use online tools to verify, optimize performance and keep content updated.

XML's Role in RSS: The Foundation of Syndicated ContentApr 12, 2025 am 12:17 AM

The core role of XML in RSS is to provide a standardized and flexible data format. 1. The structure and markup language characteristics of XML make it suitable for data exchange and storage. 2. RSS uses XML to create a standardized format to facilitate content sharing. 3. The application of XML in RSS includes elements that define feed content, such as title and release date. 4. Advantages include standardization and scalability, and challenges include document verbose and strict syntax requirements. 5. Best practices include validating XML validity, keeping it simple, using CDATA, and regularly updating.

From XML to Readable Content: Demystifying RSS FeedsApr 11, 2025 am 12:03 AM

RSSfeedsareXMLdocumentsusedforcontentaggregationanddistribution.Totransformthemintoreadablecontent:1)ParsetheXMLusinglibrarieslikefeedparserinPython.2)HandledifferentRSSversionsandpotentialparsingerrors.3)Transformthedataintouser-friendlyformatsliket

Is There an RSS Alternative Based on JSON?Apr 10, 2025 am 09:31 AM

JSONFeed is a JSON-based RSS alternative that has its advantages simplicity and ease of use. 1) JSONFeed uses JSON format, which is easy to generate and parse. 2) It supports dynamic generation and is suitable for modern web development. 3) Using JSONFeed can improve content management efficiency and user experience.

RSS Document Tools: Building, Validating, and Publishing FeedsApr 09, 2025 am 12:10 AM

How to build, validate and publish RSSfeeds? 1. Build: Use Python scripts to generate RSSfeed, including title, link, description and release date. 2. Verification: Use FeedValidator.org or Python script to check whether RSSfeed complies with RSS2.0 standards. 3. Publish: Upload RSS files to the server, or use Flask to generate and publish RSSfeed dynamically. Through these steps, you can effectively manage and share content.

Securing Your XML/RSS Feeds: A Comprehensive Security ChecklistApr 08, 2025 am 12:06 AM

Methods to ensure the security of XML/RSSfeeds include: 1. Data verification, 2. Encrypted transmission, 3. Access control, 4. Logs and monitoring. These measures protect the integrity and confidentiality of data through network security protocols, data encryption algorithms and access control mechanisms.

XML/RSS Interview Questions & Answers: Level Up Your ExpertiseApr 07, 2025 am 12:19 AM

XML is a markup language used to store and transfer data, and RSS is an XML-based format used to publish frequently updated content. 1) XML describes data structures through tags and attributes, 2) RSS defines specific tag publishing and subscribed content, 3) XML can be created and parsed using Python's xml.etree.ElementTree module, 4) XML nodes can be queried for XPath expressions, 5) Feedparser library can parse RSSfeed, 6) Common errors include tag mismatch and encoding issues, which can be validated by XMLlint, 7) Processing large XML files with SAX parser can optimize performance.

See all articles