In java, there are two ways to parse xml documents natively, namely: Dom parsing and Sax parsing
Dom parsing is powerful and can be added, deleted, modified and checked. During operation, the xml document will be treated as a document object. The method is read into the memory, so it is suitable for small documents
Sax parsing reads the content line by line and element by element from beginning to end. It is more inconvenient to modify, but it is suitable for large read-only documents
This article mainly explains Sax parsing, and the rest will be placed later
Sax uses an event-driven approach to parse documents. To put it simply, it is like watching a movie in a cinema. You can watch it from beginning to end without going back (Dom can read it back and forth)
In the process of watching a movie, every time you encounter a plot, A tear, a shoulder rub, you will mobilize your brain and nerves to receive or process this information
Similarly, during the parsing process of Sax, when the beginning and end of the document are read, the beginning and end of the element will trigger some Callback methods, you can perform corresponding event processing in these callback methods
These four methods are: startDocument(), endDocument(), startElement(), endElement
In addition, light reading It is not enough to go to the node. We also need the characters() method to carefully process the content contained in the element.
Collecting these callback methods forms a class, which is the trigger we need.
Generally, the document is read from the Main method, but the document is processed in the trigger. This is the so-called event-driven parsing method.
As shown above, in In the trigger, the document is first read, and then the elements are parsed one by one. The content of each element will be returned to the characters() method
Then the element reading is ended. After all elements are read, the document is ended. Analysis
Now we start to create the trigger class. To create this class, we first need to inherit DefaultHandler
Create SaxHandler and override the corresponding method:
import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; public class SaxHandler extends DefaultHandler { /* 此方法有三个参数 arg0是传回来的字符数组,其包含元素内容 arg1和arg2分别是数组的开始位置和结束位置 */ @Override public void characters(char[] arg0, int arg1, int arg2) throws SAXException { String content = new String(arg0, arg1, arg2); System.out.println(content); super.characters(arg0, arg1, arg2); } @Override public void endDocument() throws SAXException { System.out.println("\n…………结束解析文档…………"); super.endDocument(); } /* arg0是名称空间 arg1是包含名称空间的标签,如果没有名称空间,则为空 arg2是不包含名称空间的标签 */ @Override public void endElement(String arg0, String arg1, String arg2) throws SAXException { System.out.println("结束解析元素 " + arg2); super.endElement(arg0, arg1, arg2); } @Override public void startDocument() throws SAXException { System.out.println("…………开始解析文档…………\n"); super.startDocument(); } /*arg0是名称空间 arg1是包含名称空间的标签,如果没有名称空间,则为空 arg2是不包含名称空间的标签 arg3很明显是属性的集合 */ @Override public void startElement(String arg0, String arg1, String arg2, Attributes arg3) throws SAXException { System.out.println("开始解析元素 " + arg2); if (arg3 != null) { for (int i = 0; i < arg3.getLength(); i++) { // getQName()是获取属性名称, System.out.print(arg3.getQName(i) + "=\"" + arg3.getValue(i) + "\""); } } System.out.print(arg2 + ":"); super.startElement(arg0, arg1, arg2, arg3); } }
XML document:
<?xml version="1.0" encoding="UTF-8"?> <books> <book id="001"> <title>Harry Potter</title> <author>J K. Rowling</author> </book> <book id="002"> <title>Learning XML</title> <author>Erik T. Ray</author> </book> </books>
TestDemo test class:
import java.io.File; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; public class TestDemo { public static void main(String[] args) throws Exception { // 1.实例化SAXParserFactory对象 SAXParserFactory factory = SAXParserFactory.newInstance(); // 2.创建解析器 SAXParser parser = factory.newSAXParser(); // 3.获取需要解析的文档,生成解析器,最后解析文档 File f = new File("books.xml"); SaxHandler dh = new SaxHandler(); parser.parse(f, dh); } }
Output result:
…………开始解析文档………… 开始解析元素 books books: 开始解析元素 book id="001"book: 开始解析元素 title title:Harry Potter 结束解析元素 title 开始解析元素 author author:J K. Rowling 结束解析元素 author 结束解析元素 book 开始解析元素 book id="002"book: 开始解析元素 title title:Learning XML 结束解析元素 title 开始解析元素 author author:Erik T. Ray 结束解析元素 author 结束解析元素 book 结束解析元素 books …………结束解析文档…………
Although the above shows the execution process correctly, the output is very messy
For more clarity To execute this process, we can also rewrite SaxHandler to restore the original xml document
Rewritten SaxHandler class:
import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; public class SaxHandler extends DefaultHandler { @Override public void characters(char[] arg0, int arg1, int arg2) throws SAXException { System.out.print(new String(arg0, arg1, arg2)); super.characters(arg0, arg1, arg2); } @Override public void endDocument() throws SAXException { System.out.println("\n结束解析"); super.endDocument(); } @Override public void endElement(String arg0, String arg1, String arg2) throws SAXException { System.out.print("</"); System.out.print(arg2); System.out.print(">"); super.endElement(arg0, arg1, arg2); } @Override public void startDocument() throws SAXException { System.out.println("开始解析"); String s = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"; System.out.println(s); super.startDocument(); } @Override public void startElement(String arg0, String arg1, String arg2, Attributes arg3) throws SAXException { System.out.print("<"); System.out.print(arg2); if (arg3 != null) { for (int i = 0; i < arg3.getLength(); i++) { System.out.print(" " + arg3.getQName(i) + "=\"" + arg3.getValue(i) + "\""); } } System.out.print(">"); super.startElement(arg0, arg1, arg2, arg3); } }
More solutions to using sax to parse xml in java For method-related articles, please pay attention to the PHP Chinese website!

When processing XML and RSS data, you can optimize performance through the following steps: 1) Use efficient parsers such as lxml to improve parsing speed; 2) Use SAX parsers to reduce memory usage; 3) Use XPath expressions to improve data extraction efficiency; 4) implement multi-process parallel processing to improve processing speed.

RSS2.0 is an open standard that allows content publishers to distribute content in a structured way. It contains rich metadata such as titles, links, descriptions, release dates, etc., allowing subscribers to quickly browse and access content. The advantages of RSS2.0 are its simplicity and scalability. For example, it allows custom elements, which means developers can add additional information based on their needs, such as authors, categories, etc.

RSS is an XML-based format used to publish frequently updated content. 1. RSSfeed organizes information through XML structure, including title, link, description, etc. 2. Creating RSSfeed requires writing in XML structure, adding metadata such as language and release date. 3. Advanced usage can include multimedia files and classified information. 4. Use XML verification tools during debugging to ensure that the required elements exist and are encoded correctly. 5. Optimizing RSSfeed can be achieved by paging, caching and keeping the structure simple. By understanding and applying this knowledge, content can be effectively managed and distributed.

RSS is an XML-based format used to publish and subscribe to content. The XML structure of an RSS file includes a root element, an element, and multiple elements, each representing a content entry. Read and parse RSS files through XML parser, and users can subscribe and get the latest content.

XML has the advantages of structured data, scalability, cross-platform compatibility and parsing verification in RSS. 1) Structured data ensures consistency and reliability of content; 2) Scalability allows the addition of custom tags to suit content needs; 3) Cross-platform compatibility makes it work seamlessly on different devices; 4) Analytical and verification tools ensure the quality and integrity of the feed.

The implementation of RSS in XML is to organize content through a structured XML format. 1) RSS uses XML as the data exchange format, including elements such as channel information and project list. 2) When generating RSS files, content must be organized according to specifications and published to the server for subscription. 3) RSS files can be subscribed through a reader or plug-in to automatically update the content.

Advanced features of RSS include content namespaces, extension modules, and conditional subscriptions. 1) Content namespace extends RSS functionality, 2) Extended modules such as DublinCore or iTunes to add metadata, 3) Conditional subscription filters entries based on specific conditions. These functions are implemented by adding XML elements and attributes to improve information acquisition efficiency.

RSSfeedsuseXMLtostructurecontentupdates.1)XMLprovidesahierarchicalstructurefordata.2)Theelementdefinesthefeed'sidentityandcontainselements.3)elementsrepresentindividualcontentpieces.4)RSSisextensible,allowingcustomelements.5)Bestpracticesincludeusing


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver CS6
Visual web development tools

SublimeText3 Chinese version
Chinese version, very easy to use

Notepad++7.3.1
Easy-to-use and free code editor

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.
