search
HomeBackend DevelopmentXML/RSS TutorialSummary of several ways to parse XML in java

Summary of several ways to parse XML in java

The first one: DOM.

The full name of DOM is Document Object Model, which is also the document object model. In an application, a DOM-based XML parser converts an XML document into a collection of object models (often called a DOM tree). It is through the operation of this object model that the application implements operations on XML document data. Through the DOM interface, the application can access any part of the data in the XML document at any time. Therefore, this mechanism using the DOM interface is also called a random access mechanism.

The DOM interface provides a way to access XML document information through a hierarchical object model. These hierarchical object models form a node tree based on the XML document structure. No matter what type of information is described in the XML document, even if it is tabulated data, a list of items or a document, the model generated using DOM is in the form of a node tree. That is, DOM forces the use of a tree model to access information in XML documents. Since XML is essentially a hierarchical structure, this description method is quite effective.

The random access method provided by the DOM tree brings great flexibility to application development, and it can arbitrarily control the content of the entire XML document. However, since the DOM parser converts the entire XML document into a DOM tree and stores it in memory, when the document is large or has a complex structure, the memory requirements are relatively high. Moreover, traversing a tree with a complex structure is also a time-consuming operation. Therefore, the DOM analyzer has relatively high requirements on machine performance, and the implementation efficiency is not very ideal. However, because the tree structure idea used by the DOM analyzer is consistent with the structure of the XML document, and in view of the convenience brought by random access, the DOM analyzer still has a wide range of use value.

import java.io.File; 
  
import javax.xml.parsers.DocumentBuilder; 
import javax.xml.parsers.DocumentBuilderFactory; 
  
import org.w3c.dom.Document; 
import org.w3c.dom.Element; 
import org.w3c.dom.NodeList; 
  
public class DomTest1 
{ 
  public static void main(String[] args) throws Exception 
  { 
    // step 1: 获得dom解析器工厂(工作的作用是用于创建具体的解析器) 
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
      
//   System.out.println("class name: " + dbf.getClass().getName()); 
      
    // step 2:获得具体的dom解析器 
    DocumentBuilder db = dbf.newDocumentBuilder(); 
      
//   System.out.println("class name: " + db.getClass().getName()); 
      
    // step3: 解析一个xml文档,获得Document对象(根结点) 
    Document document = db.parse(new File("candidate.xml")); 
      
    NodeList list = document.getElementsByTagName("PERSON"); 
      
    for(int i = 0; i < list.getLength(); i++) 
    { 
      Element element = (Element)list.item(i); 
        
      String content = element.getElementsByTagName("NAME").item(0).getFirstChild().getNodeValue(); 
        
      System.out.println("name:" + content); 
        
      content = element.getElementsByTagName("ADDRESS").item(0).getFirstChild().getNodeValue(); 
        
      System.out.println("address:" + content); 
        
      content = element.getElementsByTagName("TEL").item(0).getFirstChild().getNodeValue(); 
        
      System.out.println("tel:" + content); 
        
      content = element.getElementsByTagName("FAX").item(0).getFirstChild().getNodeValue(); 
        
      System.out.println("fax:" + content); 
        
      content = element.getElementsByTagName("EMAIL").item(0).getFirstChild().getNodeValue(); 
        
      System.out.println("email:" + content); 
        
      System.out.println("--------------------------------------"); 
    } 
  } 
}
import java.io.File; 
  
import javax.xml.parsers.DocumentBuilder; 
import javax.xml.parsers.DocumentBuilderFactory; 
  
import org.w3c.dom.Attr; 
import org.w3c.dom.Comment; 
import org.w3c.dom.Document; 
import org.w3c.dom.Element; 
import org.w3c.dom.NamedNodeMap; 
import org.w3c.dom.Node; 
import org.w3c.dom.NodeList; 
  
/** 
 * 使用递归解析给定的任意一个xml文档并且将其内容输出到命令行上 
 * @author zhanglong 
 * 
 */
public class DomTest3 
{ 
  public static void main(String[] args) throws Exception 
  { 
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
    DocumentBuilder db = dbf.newDocumentBuilder(); 
      
    Document doc = db.parse(new File("student.xml")); 
    //获得根元素结点 
    Element root = doc.getDocumentElement(); 
      
    parseElement(root); 
  } 
    
  private static void parseElement(Element element) 
  { 
    String tagName = element.getNodeName(); 
      
    NodeList children = element.getChildNodes(); 
      
    System.out.print("<" + tagName); 
      
    //element元素的所有属性所构成的NamedNodeMap对象,需要对其进行判断 
    NamedNodeMap map = element.getAttributes(); 
      
    //如果该元素存在属性 
    if(null != map) 
    { 
      for(int i = 0; i < map.getLength(); i++) 
      { 
        //获得该元素的每一个属性 
        Attr attr = (Attr)map.item(i); 
          
        String attrName = attr.getName(); 
        String attrValue = attr.getValue(); 
          
        System.out.print(" " + attrName + "=\"" + attrValue + "\""); 
      } 
    } 
      
    System.out.print(">"); 
      
    for(int i = 0; i < children.getLength(); i++) 
    { 
      Node node = children.item(i); 
      //获得结点的类型 
      short nodeType = node.getNodeType(); 
        
      if(nodeType == Node.ELEMENT_NODE) 
      { 
        //是元素,继续递归 
        parseElement((Element)node); 
      } 
      else if(nodeType == Node.TEXT_NODE) 
      { 
        //递归出口 
        System.out.print(node.getNodeValue()); 
      } 
      else if(nodeType == Node.COMMENT_NODE) 
      { 
        System.out.print("<!--"); 
          
        Comment comment = (Comment)node; 
          
        //注释内容 
        String data = comment.getData(); 
          
        System.out.print(data); 
          
        System.out.print("-->"); 
      } 
    } 
      
    System.out.print("</" + tagName + ">"); 
  } 
}

sax: The full name of SAX is Simple APIs for XML, which is the XML simple application programming interface. Unlike DOM, the access mode provided by SAX is a sequential mode, which is a fast way to read and write XML data. When a SAX parser is used to analyze an XML document, a series of events will be triggered and the corresponding event processing functions will be activated. The application uses these event processing functions to access the XML document. Therefore, the SAX interface is also called an event-driven interface. .

import java.io.File; 
  
import javax.xml.parsers.SAXParser; 
import javax.xml.parsers.SAXParserFactory; 
  
import org.xml.sax.Attributes; 
import org.xml.sax.SAXException; 
import org.xml.sax.helpers.DefaultHandler; 
  
public class SaxTest1 
{ 
  public static void main(String[] args) throws Exception 
  { 
    //step1: 获得SAX解析器工厂实例 
    SAXParserFactory factory = SAXParserFactory.newInstance(); 
      
    //step2: 获得SAX解析器实例 
    SAXParser parser = factory.newSAXParser(); 
      
    //step3: 开始进行解析 
    parser.parse(new File("student.xml"), new MyHandler()); 
      
  } 
} 
  
class MyHandler extends DefaultHandler 
{ 
  @Override
  public void startDocument() throws SAXException 
  { 
    System.out.println("parse began"); 
  } 
    
  @Override
  public void endDocument() throws SAXException 
  { 
    System.out.println("parse finished"); 
  } 
    
  @Override
  public void startElement(String uri, String localName, String qName, 
      Attributes attributes) throws SAXException 
  { 
    System.out.println("start element"); 
  } 
    
  @Override
  public void endElement(String uri, String localName, String qName) 
      throws SAXException 
  { 
    System.out.println("finish element"); 
  } 
}
import java.io.File; 
import java.util.Stack; 
  
import javax.xml.parsers.SAXParser; 
import javax.xml.parsers.SAXParserFactory; 
  
import org.xml.sax.Attributes; 
import org.xml.sax.SAXException; 
import org.xml.sax.helpers.DefaultHandler; 
  
public class SaxTest2 
{ 
  public static void main(String[] args) throws Exception 
  { 
    SAXParserFactory factory = SAXParserFactory.newInstance(); 
      
    SAXParser parser = factory.newSAXParser(); 
      
    parser.parse(new File("student.xml"), new MyHandler2()); 
  } 
} 
  
class MyHandler2 extends DefaultHandler 
{ 
  private Stack<String> stack = new Stack<String>(); 
    
  private String name; 
    
  private String gender; 
    
  private String age; 
    
  @Override
  public void startElement(String uri, String localName, String qName, 
      Attributes attributes) throws SAXException 
  { 
    stack.push(qName); 
      
    for(int i = 0; i < attributes.getLength(); i++) 
    { 
      String attrName = attributes.getQName(i); 
      String attrValue = attributes.getValue(i); 
        
      System.out.println(attrName + "=" + attrValue); 
    } 
  } 
    
  @Override
  public void characters(char[] ch, int start, int length) 
      throws SAXException 
  { 
    String tag = stack.peek(); 
      
    if("姓名".equals(tag)) 
    { 
      name = new String(ch, start,length); 
    } 
    else if("性别".equals(tag)) 
    { 
      gender = new String(ch, start, length); 
    } 
    else if("年龄".equals(tag)) 
    { 
      age = new String(ch, start, length); 
    } 
  } 
    
  @Override
  public void endElement(String uri, String localName, String qName) 
      throws SAXException 
  { 
    stack.pop(); //表示该元素已经解析完毕,需要从栈中弹出 
      
    if("学生".equals(qName)) 
    { 
      System.out.println("姓名:" + name); 
      System.out.println("性别:" + gender); 
      System.out.println("年龄:" + age); 
        
      System.out.println(); 
    } 
      
  } 
}

JDOM:

JDOM is an open source project. It is based on a tree structure and uses pure JAVA technology to parse, generate, serialize and perform various operations on XML documents. (http://jdom.org)

•JDOM directly serves JAVA programming. It uses many features of the more powerful JAVA language (method overloading, collection concepts, etc.) to effectively combine the functions of SAX and DOM.

•JDOM is a new API function that uses Java language to read, write, and operate XML. Under the premise of being direct, simple and efficient, these API functions are optimized to the maximum extent.

jdom creates xml

import java.io.FileWriter; 
  
import org.jdom.Attribute; 
import org.jdom.Comment; 
import org.jdom.Document; 
import org.jdom.Element; 
import org.jdom.output.Format; 
import org.jdom.output.XMLOutputter; 
  
public class JDomTest1 
{ 
  public static void main(String[] args) throws Exception 
  { 
    Document document = new Document(); 
  
    Element root = new Element("root"); 
  
    document.addContent(root); 
  
    Comment comment = new Comment("This is my comments"); 
  
    root.addContent(comment); 
  
    Element e = new Element("hello"); 
  
    e.setAttribute("sohu", "www.sohu.com"); 
  
    root.addContent(e); 
  
    Element e2 = new Element("world"); 
  
    Attribute attr = new Attribute("test", "hehe"); 
  
    e2.setAttribute(attr); 
  
    e.addContent(e2); 
  
    e2.addContent(new Element("aaa").setAttribute("a", "b") 
        .setAttribute("x", "y").setAttribute("gg", "hh").setText("text content")); 
  
      
    Format format = Format.getPrettyFormat(); 
      
    format.setIndent("  "); 
//   format.setEncoding("gbk"); 
      
    XMLOutputter out = new XMLOutputter(format); 
  
    out.output(document, new FileWriter("jdom.xml")); 
      
  } 
}

JDOM parses xml

import java.io.File; 
import java.io.FileOutputStream; 
import java.util.List; 
  
import org.jdom.Attribute; 
import org.jdom.Document; 
import org.jdom.Element; 
import org.jdom.input.SAXBuilder; 
import org.jdom.output.Format; 
import org.jdom.output.XMLOutputter; 
  
public class JDomTest2 
{ 
  public static void main(String[] args) throws Exception 
  { 
    SAXBuilder builder = new SAXBuilder(); 
      
    Document doc = builder.build(new File("jdom.xml")); 
      
    Element element = doc.getRootElement(); 
      
    System.out.println(element.getName()); 
      
    Element hello = element.getChild("hello"); 
      
    System.out.println(hello.getText()); 
      
    List list = hello.getAttributes(); 
      
    for(int i = 0 ;i < list.size(); i++) 
    { 
      Attribute attr = (Attribute)list.get(i); 
        
      String attrName = attr.getName(); 
      String attrValue = attr.getValue(); 
        
      System.out.println(attrName + "=" + attrValue); 
    } 
      
    hello.removeChild("world"); 
      
    XMLOutputter out = new XMLOutputter(Format.getPrettyFormat().setIndent("  ")); 
      
      
    out.output(doc, new FileOutputStream("jdom2.xml"));    
      
  } 
}

Dom4j

import java.io.FileOutputStream; 
import java.io.FileWriter; 
  
import org.dom4j.Document; 
import org.dom4j.DocumentHelper; 
import org.dom4j.Element; 
import org.dom4j.io.OutputFormat; 
import org.dom4j.io.XMLWriter; 
  
public class Test1 
{ 
  public static void main(String[] args) throws Exception 
  { 
    // 创建文档并设置文档的根元素节点 :第一种方式 
    // Document document = DocumentHelper.createDocument(); 
    // 
    // Element root = DocumentHelper.createElement("student"); 
    // 
    // document.setRootElement(root); 
  
    // 创建文档并设置文档的根元素节点 :第二种方式 
    Element root = DocumentHelper.createElement("student"); 
    Document document = DocumentHelper.createDocument(root); 
  
    root.addAttribute("name", "zhangsan"); 
  
    Element helloElement = root.addElement("hello"); 
    Element worldElement = root.addElement("world"); 
  
    helloElement.setText("hello"); 
    worldElement.setText("world"); 
  
    helloElement.addAttribute("age", "20"); 
  
    XMLWriter xmlWriter = new XMLWriter(); 
    xmlWriter.write(document); 
      
    OutputFormat format = new OutputFormat("  ", true); 
      
    XMLWriter xmlWriter2 = new XMLWriter(new FileOutputStream("student2.xml"), format); 
    xmlWriter2.write(document); 
      
    XMLWriter xmlWriter3 = new XMLWriter(new FileWriter("student3.xml"), format); 
      
    xmlWriter3.write(document); 
    xmlWriter3.close(); 
  
  } 
}
import java.io.File; 
import java.util.Iterator; 
import java.util.List; 
  
import javax.xml.parsers.DocumentBuilder; 
import javax.xml.parsers.DocumentBuilderFactory; 
  
import org.dom4j.Document; 
import org.dom4j.Element; 
import org.dom4j.io.DOMReader; 
import org.dom4j.io.SAXReader; 
  
public class Test2 
{ 
  public static void main(String[] args) throws Exception 
  { 
    SAXReader saxReader = new SAXReader(); 
      
    Document doc = saxReader.read(new File("student2.xml")); 
      
    Element root = doc.getRootElement(); 
      
    System.out.println("root element: " + root.getName()); 
      
    List childList = root.elements(); 
      
    System.out.println(childList.size()); 
      
    List childList2 = root.elements("hello"); 
      
    System.out.println(childList2.size()); 
      
    Element first = root.element("hello"); 
      
    System.out.println(first.attributeValue("age")); 
      
    for(Iterator iter = root.elementIterator(); iter.hasNext();) 
    { 
      Element e = (Element)iter.next(); 
        
      System.out.println(e.attributeValue("age")); 
    } 
      
    System.out.println("---------------------------"); 
      
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
    DocumentBuilder db = dbf.newDocumentBuilder(); 
    org.w3c.dom.Document document = db.parse(new File("student2.xml")); 
      
    DOMReader domReader = new DOMReader(); 
      
    //将JAXP的Document转换为dom4j的Document 
    Document d = domReader.read(document); 
      
    Element rootElement = d.getRootElement(); 
      
    System.out.println(rootElement.getName()); 
  
  } 
}
import java.io.FileWriter; 
  
import org.jdom.Attribute; 
import org.jdom.Document; 
import org.jdom.Element; 
import org.jdom.output.Format; 
import org.jdom.output.XMLOutputter; 
  
public class Test3 
{ 
  public static void main(String[] args) throws Exception 
  { 
    Document document = new Document(); 
  
    Element root = new Element("联系人列表").setAttribute(new Attribute("公司", 
        "A集团")); 
  
    document.addContent(root); 
      
    Element contactPerson = new Element("联系人"); 
      
    root.addContent(contactPerson); 
  
    contactPerson 
        .addContent(new Element("姓名").setText("张三")) 
        .addContent(new Element("公司").setText("A公司")) 
        .addContent(new Element("电话").setText("021-55556666")) 
        .addContent( 
            new Element("地址") 
                .addContent(new Element("街道").setText("5街")) 
                .addContent(new Element("城市").setText("上海")) 
                .addContent(new Element("省份").setText("上海市"))); 
  
    XMLOutputter output = new XMLOutputter(Format.getPrettyFormat() 
        .setIndent("  ").setEncoding("gbk")); 
  
    output.output(document, new FileWriter("contact.xml")); 
  
  } 
}

For more summary of several ways of parsing XML in java, please pay attention to PHP Chinese website !

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Scaling XML/RSS Processing: Performance Optimization TechniquesScaling XML/RSS Processing: Performance Optimization TechniquesApr 27, 2025 am 12:28 AM

When processing XML and RSS data, you can optimize performance through the following steps: 1) Use efficient parsers such as lxml to improve parsing speed; 2) Use SAX parsers to reduce memory usage; 3) Use XPath expressions to improve data extraction efficiency; 4) implement multi-process parallel processing to improve processing speed.

RSS Document Formats: Exploring RSS 2.0 and BeyondRSS Document Formats: Exploring RSS 2.0 and BeyondApr 26, 2025 am 12:22 AM

RSS2.0 is an open standard that allows content publishers to distribute content in a structured way. It contains rich metadata such as titles, links, descriptions, release dates, etc., allowing subscribers to quickly browse and access content. The advantages of RSS2.0 are its simplicity and scalability. For example, it allows custom elements, which means developers can add additional information based on their needs, such as authors, categories, etc.

Understanding RSS: An XML PerspectiveUnderstanding RSS: An XML PerspectiveApr 25, 2025 am 12:14 AM

RSS is an XML-based format used to publish frequently updated content. 1. RSSfeed organizes information through XML structure, including title, link, description, etc. 2. Creating RSSfeed requires writing in XML structure, adding metadata such as language and release date. 3. Advanced usage can include multimedia files and classified information. 4. Use XML verification tools during debugging to ensure that the required elements exist and are encoded correctly. 5. Optimizing RSSfeed can be achieved by paging, caching and keeping the structure simple. By understanding and applying this knowledge, content can be effectively managed and distributed.

RSS in XML: Decoding Tags, Attributes, and StructureRSS in XML: Decoding Tags, Attributes, and StructureApr 24, 2025 am 12:09 AM

RSS is an XML-based format used to publish and subscribe to content. The XML structure of an RSS file includes a root element, an element, and multiple elements, each representing a content entry. Read and parse RSS files through XML parser, and users can subscribe and get the latest content.

XML's Advantages in RSS: A Technical Deep DiveXML's Advantages in RSS: A Technical Deep DiveApr 23, 2025 am 12:02 AM

XML has the advantages of structured data, scalability, cross-platform compatibility and parsing verification in RSS. 1) Structured data ensures consistency and reliability of content; 2) Scalability allows the addition of custom tags to suit content needs; 3) Cross-platform compatibility makes it work seamlessly on different devices; 4) Analytical and verification tools ensure the quality and integrity of the feed.

RSS in XML: Unveiling the Core of Content SyndicationRSS in XML: Unveiling the Core of Content SyndicationApr 22, 2025 am 12:08 AM

The implementation of RSS in XML is to organize content through a structured XML format. 1) RSS uses XML as the data exchange format, including elements such as channel information and project list. 2) When generating RSS files, content must be organized according to specifications and published to the server for subscription. 3) RSS files can be subscribed through a reader or plug-in to automatically update the content.

Beyond the Basics: Advanced RSS Document FeaturesBeyond the Basics: Advanced RSS Document FeaturesApr 21, 2025 am 12:03 AM

Advanced features of RSS include content namespaces, extension modules, and conditional subscriptions. 1) Content namespace extends RSS functionality, 2) Extended modules such as DublinCore or iTunes to add metadata, 3) Conditional subscription filters entries based on specific conditions. These functions are implemented by adding XML elements and attributes to improve information acquisition efficiency.

The XML Backbone: How RSS Feeds are StructuredThe XML Backbone: How RSS Feeds are StructuredApr 20, 2025 am 12:02 AM

RSSfeedsuseXMLtostructurecontentupdates.1)XMLprovidesahierarchicalstructurefordata.2)Theelementdefinesthefeed'sidentityandcontainselements.3)elementsrepresentindividualcontentpieces.4)RSSisextensible,allowingcustomelements.5)Bestpracticesincludeusing

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment