Home >Backend Development >XML/RSS Tutorial >Detailed introduction to DOM parsing in XML parsing
1. Concept
xml files are mostly used to describe information, so after obtaining an xml document, extracting the corresponding information according to the elements in the xml is xml parsing. There are two ways to parse Xml, one is DOM parsing and the other is SAX parsing. The two operation methods are as shown in the figure.
2. DOM parsing
The xml parser based on DOM parsing converts it into a collection of object models, using A tree is a data structure that stores information. Through the DOM interface, the application can access any part of the data in the xml document at any time, so this method of using the DOM interface to access is also called random access.
This method also has flaws, because the DOM analyzer converts the entire xml file into a tree and stores it in memory. When the file structure is large or the data is complex, this method has higher memory requirements. , and traversing a tree with a complex structure is also a very time-consuming operation. However, the tree structure used by DOM is consistent with the way xml stores information, and its random access can also be used, so the DOM interface still has widespread use value.
Here we give an example to illustrate the data structure of converting xml into a tree.
<?xml version="1.0" encoding="GBK"?> <address> <linkman> <name>Van_DarkHolme</name> <email>van_darkholme@163.com</email> </linkman> <linkman> <name>Bili</name> <email>Bili@163.com</email> </linkman> </address>
The structure of converting the xml into a tree is:
There are the following 4 core operation interfaces in DOM parsing
Document: This interface represents the entire xml document and is represented as the root of the entire DOM, which is the entrance to the tree. Through this interface, the contents of all elements in the xml can be accessed. The common methods are as follows.
(Note: Although not shown in the above figure, the attributes of name and email are also one node respectively)
Common methods of Document
Node: This interface plays an important role in the entire DOM tree. The core interfaces of DOM operations are inherited from Node (Document, Element, Attr). In the DOM tree, each Node interface represents a DOM tree node
Common methods of Node interface
NodeList: This interface represents a collection of points. Generally used for a set of nodes in an ordered relationship.
NodeList common methods
##NamedNodeMap: This interface represents the one-to-one relationship between a group of nodes and their unique names, and is mainly used to represent node attributesIn addition to the above four core interfaces, if a program needs to perform DOM parsing operations, it needs to follow the following steps: 1. Establish a DocumentBuilderFactor to obtain the DocumentBuilder object: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();2. Create DocumentBuidler: DocumentBuilder builder = factory.newDocumentBuilder();3. Create Document object and obtain Entry of the tree: Document doc = builder.parse("relative path or absolute path of the xml file");4. Create NodeList:NodeList n1 = doc .getElementByTagName("Read Node");5. Get xml informationpublic class DOMDemo01 { public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException{ //建立DocumentBuilderFactor,用于获得DocumentBuilder对象: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); //2.建立DocumentBuidler: DocumentBuilder builder = factory.newDocumentBuilder(); //3.建立Document对象,获取树的入口: Document doc = builder.parse("src//dom_demo_02.xml"); //4.建立NodeList: NodeList node = doc.getElementsByTagName("linkman"); //5.进行xml信息获取 for(int i=0;i<node.getLength();i++){ Element e = (Element)node.item(i); System.out.println("姓名:"+ e.getElementsByTagName("name").item(0).getFirstChild().getNodeValue()); System.out.println("邮箱:"+ e.getElementsByTagName("email").item(0).getFirstChild().getNodeValue()); } } }The above code will be analyzed from the fourth point:Through doc.getElementByTagName("linkman") we obtain a NodeList. The above xml file contains two linkman nodes, so the NodeList here contains two Node (both linkman nodes), and then through the loop method to obtain the information in the xml file. Element e = (Element)node.item(i) obtains the linkman node, that is, e points to the linkmane.getElementTagName("name").item(0).getFirstChild ().getNodeValue();getElementTagName("name"); Obtained all name nodes under the linkman (actually only 1); Item(0); Take the first Name nodes (just one);
getFristChild(); Get the text node under the name node, which is the node where the content van is located (as mentioned above, the text content is also a separate node, createTextNode() in the Document method list is to create the text node);
getNodeValue() gets the value of the text node: van_darkholme;
For more related questions, please visit the PHP Chinese website: XML video tutorial
The above is the detailed content of Detailed introduction to DOM parsing in XML parsing. For more information, please follow other related articles on the PHP Chinese website!