search
HomeBackend DevelopmentXML/RSS TutorialSolution to using sax to parse xml in java

In java, there are two ways to parse xml documents natively, namely: Dom parsing and Sax parsing

Dom parsing is powerful and can be added, deleted, modified and checked. During operation, the xml document will be treated as a document object. The method is read into the memory, so it is suitable for small documents

Sax parsing reads the content line by line and element by element from beginning to end. It is more inconvenient to modify, but it is suitable for large read-only documents

This article mainly explains Sax parsing, and the rest will be placed later

Sax uses an event-driven approach to parse documents. To put it simply, it is like watching a movie in a cinema. You can watch it from beginning to end without going back (Dom can read it back and forth)

In the process of watching a movie, every time you encounter a plot, A tear, a shoulder rub, you will mobilize your brain and nerves to receive or process this information

Similarly, during the parsing process of Sax, when the beginning and end of the document are read, the beginning and end of the element will trigger some Callback methods, you can perform corresponding event processing in these callback methods

These four methods are: startDocument(), endDocument(), startElement(), endElement

In addition, light reading It is not enough to go to the node. We also need the characters() method to carefully process the content contained in the element.

Collecting these callback methods forms a class, which is the trigger we need.

Generally, the document is read from the Main method, but the document is processed in the trigger. This is the so-called event-driven parsing method.

Solution to using sax to parse xml in java

As shown above, in In the trigger, the document is first read, and then the elements are parsed one by one. The content of each element will be returned to the characters() method

Then the element reading is ended. After all elements are read, the document is ended. Analysis

Now we start to create the trigger class. To create this class, we first need to inherit DefaultHandler

Create SaxHandler and override the corresponding method:

import org.xml.sax.Attributes; 
import org.xml.sax.SAXException; 
import org.xml.sax.helpers.DefaultHandler; 

  
public class SaxHandler extends DefaultHandler { 

    /* 此方法有三个参数 
       arg0是传回来的字符数组,其包含元素内容 
       arg1和arg2分别是数组的开始位置和结束位置 */ 
    @Override 
    public void characters(char[] arg0, int arg1, int arg2) throws SAXException { 
        String content = new String(arg0, arg1, arg2); 
        System.out.println(content); 
        super.characters(arg0, arg1, arg2); 
    } 

    @Override 
    public void endDocument() throws SAXException { 
        System.out.println("\n…………结束解析文档…………"); 
        super.endDocument(); 
    } 

    /* arg0是名称空间 
       arg1是包含名称空间的标签,如果没有名称空间,则为空 
       arg2是不包含名称空间的标签 */ 
    @Override 
    public void endElement(String arg0, String arg1, String arg2) 
            throws SAXException { 
        System.out.println("结束解析元素  " + arg2); 
        super.endElement(arg0, arg1, arg2); 
    } 

    @Override 
    public void startDocument() throws SAXException { 
        System.out.println("…………开始解析文档…………\n"); 
        super.startDocument(); 
    } 

    /*arg0是名称空间 
      arg1是包含名称空间的标签,如果没有名称空间,则为空 
      arg2是不包含名称空间的标签 
      arg3很明显是属性的集合 */
    @Override
    public void startElement(String arg0, String arg1, String arg2, 
            Attributes arg3) throws SAXException { 
        System.out.println("开始解析元素 " + arg2); 
        if (arg3 != null) { 
            for (int i = 0; i < arg3.getLength(); i++) { 
                 // getQName()是获取属性名称, 
                System.out.print(arg3.getQName(i) + "=\"" + arg3.getValue(i) + "\""); 
            } 
        } 
        System.out.print(arg2 + ":"); 
        super.startElement(arg0, arg1, arg2, arg3); 
    } 
}

XML document:

<?xml version="1.0" encoding="UTF-8"?>  
<books>  
   <book id="001">  
      <title>Harry Potter</title>  
      <author>J K. Rowling</author>  
   </book>  
   <book id="002">  
      <title>Learning XML</title>  
      <author>Erik T. Ray</author>  
   </book>  
</books>

TestDemo test class:

import java.io.File; 

import javax.xml.parsers.SAXParser; 
import javax.xml.parsers.SAXParserFactory; 

  
public class TestDemo { 

    public static void main(String[] args) throws Exception { 
        // 1.实例化SAXParserFactory对象 
        SAXParserFactory factory = SAXParserFactory.newInstance(); 
        // 2.创建解析器 
        SAXParser parser = factory.newSAXParser(); 
        // 3.获取需要解析的文档,生成解析器,最后解析文档 
        File f = new File("books.xml"); 
        SaxHandler dh = new SaxHandler(); 
        parser.parse(f, dh); 
    } 
}

Output result:

…………开始解析文档………… 

开始解析元素 books 
books:  

开始解析元素 book 
id="001"book:  

开始解析元素 title 
title:Harry Potter 
结束解析元素  title 

        
开始解析元素 author 
author:J K. Rowling 
结束解析元素  author 

     
结束解析元素  book 

     
开始解析元素 book 
id="002"book:  

开始解析元素 title 
title:Learning XML 
结束解析元素  title 

        
开始解析元素 author 
author:Erik T. Ray 
结束解析元素  author 

     
结束解析元素  book 

  
结束解析元素  books 

…………结束解析文档…………

Although the above shows the execution process correctly, the output is very messy

For more clarity To execute this process, we can also rewrite SaxHandler to restore the original xml document

Rewritten SaxHandler class:

import org.xml.sax.Attributes; 
import org.xml.sax.SAXException; 
import org.xml.sax.helpers.DefaultHandler; 

  
public class SaxHandler extends DefaultHandler { 

    @Override
    public void characters(char[] arg0, int arg1, int arg2) throws SAXException { 
        System.out.print(new String(arg0, arg1, arg2)); 
        super.characters(arg0, arg1, arg2); 
    } 

    @Override
    public void endDocument() throws SAXException { 
        System.out.println("\n结束解析"); 
        super.endDocument(); 
    } 

    @Override
    public void endElement(String arg0, String arg1, String arg2) 
            throws SAXException { 
        System.out.print("</"); 
        System.out.print(arg2); 
        System.out.print(">"); 
        super.endElement(arg0, arg1, arg2); 
    } 

    @Override
    public void startDocument() throws SAXException { 
        System.out.println("开始解析"); 
        String s = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"; 
        System.out.println(s); 
        super.startDocument(); 
    } 

    @Override
    public void startElement(String arg0, String arg1, String arg2, 
            Attributes arg3) throws SAXException { 

        System.out.print("<"); 
        System.out.print(arg2); 

        if (arg3 != null) { 
            for (int i = 0; i < arg3.getLength(); i++) { 
                System.out.print(" " + arg3.getQName(i) + "=\"" + arg3.getValue(i) + "\""); 
            } 
        } 
        System.out.print(">"); 
        super.startElement(arg0, arg1, arg2, arg3); 
    } 

}

More solutions to using sax to parse xml in java For method-related articles, please pay attention to the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How to Use RSS Feeds for News Aggregation and Content Curation?How to Use RSS Feeds for News Aggregation and Content Curation?Mar 10, 2025 pm 03:47 PM

This article explains how to use RSS feeds for efficient news aggregation and content curation. It details subscribing to feeds, using RSS readers (like Feedly and Inoreader), organizing feeds, and leveraging features for targeted content. The bene

How Can I Integrate XML and Semantic Web Technologies?How Can I Integrate XML and Semantic Web Technologies?Mar 10, 2025 pm 05:50 PM

This article explores integrating XML and Semantic Web technologies. The core issue is mapping XML's structured data to RDF triples for semantic interoperability. Best practices involve ontology definition, strategic mapping approaches, careful att

How Do I Use Atom Publishing Protocol for Web Content Management?How Do I Use Atom Publishing Protocol for Web Content Management?Mar 10, 2025 pm 05:48 PM

This article explains Atom Publishing Protocol (AtomPub) for web content management. It details using HTTP methods (GET, POST, PUT, DELETE) with Atom format for content creation, retrieval, updating, and deletion. The article also discusses AtomPub

How Do I Implement Content Syndication Using RSS?How Do I Implement Content Syndication Using RSS?Mar 10, 2025 pm 03:41 PM

This article details implementing content syndication using RSS feeds. It covers creating RSS feeds, identifying target websites, submitting feeds, and monitoring effectiveness. Challenges like limited control and rich media support are also discus

How Do I Use XML for Data Interoperability in Healthcare/Finance/etc.?How Do I Use XML for Data Interoperability in Healthcare/Finance/etc.?Mar 10, 2025 pm 05:50 PM

This article details using XML for data interoperability, focusing on healthcare and finance. It covers schema definition, XML document creation, data transformation, parsing, and exchange mechanisms. Key XML standards (HL7, DICOM, FinML, ISO 20022)

How Can I Secure RSS Feeds Against Unauthorized Access?How Can I Secure RSS Feeds Against Unauthorized Access?Mar 10, 2025 pm 03:42 PM

This article details securing RSS feeds against unauthorized access. It examines various methods including HTTP authentication, API keys with rate limiting, HTTPS, and content obfuscation (discouraged). Best practices involve IP restriction, revers

How Can I Create a Custom XML Vocabulary for My Domain?How Can I Create a Custom XML Vocabulary for My Domain?Mar 10, 2025 pm 05:48 PM

This article details creating custom XML vocabularies (schemas) for data consistency. It covers defining scope, identifying entities & attributes, designing XML structure, choosing a schema language (XSD or Relax NG), schema development, testing

How Can I Optimize RSS Feeds for SEO?How Can I Optimize RSS Feeds for SEO?Mar 10, 2025 pm 03:39 PM

This article explains how optimizing RSS feeds indirectly improves website SEO. It focuses on enhancing feed content (descriptions, keywords, metadata), structure (XML, formatting, encoding), and distribution to boost user engagement, content discov

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Atom editor mac version download

Atom editor mac version download

The most popular open source editor