Home  >  Article  >  Backend Development  >  Detailed explanation of XML SAX parsing

Detailed explanation of XML SAX parsing

PHPz
PHPzOriginal
2017-04-04 10:57:221767browse



The final function of DOM and SAX is to allow us to use languages ​​such as java JavaScript to obtain nodes, text, attributes and other information in xml files.

This article is quoted from other blogs. The content is easy to understand. In order to save time, I have directly excerpted it. There are usually two ways to parse XML in JAVA, DOM and SAX. Although DOM is a W3C standard and provides a standard parsing method, its parsing efficiency has always been unsatisfactory, because when using DOM to parse XML, the parser reads the entire document and builds a memory-resident tree structure (node ​​tree). ), and then your code can use the DOM's standard interface to manipulate the tree structure. But in most cases we are only interested in part of the document, without parsing the entire document first, and it is also very time-consuming to index some of the data we need from the root node of the node tree.

SAX is an alternative to XML parsing. Compared to the Document Object Model DOM, SAX is a faster and lighter way to read and manipulate XML data


#. SAX allows you to process a document as it is read, so you don't have to wait for the entire document to be stored before taking action. It doesn't involve the overhead and conceptual leaps necessary for the DOM. SAX API is an event-based API suitable for processing data streams, that is, processing data sequentially as the data flows. The SAX API


# will notify you when certain events occur while it is parsing your document. Data you do not save will be discarded when you respond to it.


The following is an example of SAX parsing XML (a bit long, because all methods of SAX event processing are annotated in detail). There are four main interfaces for processing events in the SAX API. , they are ContentHandler, DTDHandler, EntityResolver and ErrorHandler respectively. The following example may be a bit lengthy. In fact, as long as you inherit the DefaultHandler class and overwrite some of the event processing methods, you can also achieve the effect of this example. But in order to have an overview, let's take a look at all the main event parsing methods in the SAX API. (In fact, DefaultHandler implements the above four event handler interfaces, and then provides the default implementation of each abstract method.)


1, ContentHandler interface: receiving documents Handler interface for notifications of logical content.

Java Code Collection Code

'import org.xml.sax.Attributes;

import org.xml.sax.ContentHandler;

import org. xml.sax.Locator;

import org.xml.sax.SAXException;


class MyContentHandler implements ContentHandler{

StringBuffer jsonStringBuffer;

int frontBlankCount = 0;

public MyContentHandler(){

jsonStringBuffer = new StringBuffer();

}

/*

* Receive notification of character data.

* In the DOM, ch[begin:end] is equivalent to the node value of the Text node (nodeValue)

*/

@Override

public void characters(char[] ch, int begin, int length) throws SAXException {

StringBuffer buffer = new StringBuffer();

for(int i = begin; i < begin+length; i++){

switch(ch[i]){

case '\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\':buffer.append("\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\");break;

case '\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\r':buffer.append( "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\r");break;

case '\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\n':buffer.append("\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n");break;

case '\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\t':buffer.append("\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\t");break;

case '\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"':buffer.append("\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"");break;

default : buffer .append(ch[i]);

}

}

System.out.println(this.toBlankString(this.frontBlankCount)+

">>> characters("+length+"): "+buffer.toString());

}



/*

* Receive notification of the end of the document.

*/

@Override

public void endDocument() throws SAXException {

System.out.println(this.toBlankString(--this. frontBlankCount)+

">>> end document");

}



#/*

* Receive notification of the end of a document.

* The meaning of the parameters is as follows:

* uri: the namespace of the element

* localName: the local name of the element (without prefix)

* qName : Qualified name of the element (with prefix)

*

*/

@Override

public void endElement(String uri,String localName,String qName )

throws SAXException {

System.out.println(this.toBlankString(--this.frontBlankCount)+

">>> end element : " +qName+"("+uri+")");

}


/*

* Ends the mapping of the prefix URI range.

*/

@Override

public void endPrefixMapping(String prefix) throws SAXException {

System.out.println(this.toBlankString(-- this.frontBlankCount)+

">>> end prefix_mapping : "+prefix);

}


/*

* Receive notification of ignorable whitespace in element content.

* Parameter meanings are as follows:

* ch: characters from XML document

* start: starting position in the array

* length: from the array The number of characters read in

*/

@Override

public void ignorableWhitespace(char[] ch, int begin, int length)

throws SAXException {

StringBuffer buffer = new StringBuffer();

for(int i = begin ; i < begin+length ; i++){

switch(ch [i]){

case '\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\':buffer.append("\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\");break;

case '\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\r':buffer.append("\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\r");break;

case '\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n':buffer. append("\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n");break;

case '\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\t':buffer.append("\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t");break ;

case '\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\"':buffer.append("\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\"");break;

default : buffer.append(ch[i]);

}

}

System.out.println(this.toBlankString(this.frontBlankCount)+">>> ignorable whitespace("+length+"): "+ buffer.toString());

}


/*

* Receive notification of processing instructions.

* Parameter meanings are as follows:

* target: Processing instruction target

* data: Processing instruction data, if not provided, it is null.

*/

@Override

public void processingInstruction(String target,String data)

throws SAXException {

System.out .println(this.toBlankString(this.frontBlankCount)+">>> process instruction: (target = \\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\""

+target+" \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\",data = \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\""+data+"\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \")");

}


/*

* Receives an object used to find the origin of a SAX document event.

* The meaning of the parameters is as follows:

* locator: an object that can return the location of any SAX document event

*/

@Override

public void setDocumentLocator(Locator locator) {

System.out.println(this.toBlankString(this.frontBlankCount)+

">>> set document_locator : (lineNumber = " +locator.getLineNumber()

+",columnNumber = "+locator.getColumnNumber()

+",systemId = "+locator.getSystemId()

+" ,publicId = "+locator.getPublicId()+")");


##}


/*

* Receive notification of skipped entities.

* The meaning of the parameters is as follows:

* name: The name of the entity to be skipped. If it is a parameter entity, the name will start with '%',

*             If it is an external DTD subset, it will be the string "[dtd]"

*/

@Override

public void skippedEntity(String name ) throws SAXException {

System.out.println(this.toBlankString(this.frontBlankCount)+

">>> skipped_entity : "+name);

}


/*

* Receive notification of the start of a document.

*/

@Override

public void startDocument() throws SAXException {

System.out.println(this.toBlankString(this.frontBlankCount++) +

">>> start document ");

}


/*

* Receive notification of the start of an element.

* The meaning of the parameters is as follows:

* uri: the namespace of the element

* localName: the local name of the element (without prefix)

* qName : Qualified name of the element (with prefix)

* atts : Attribute collection of the element

*/

@Override

public void startElement(String uri , String localName, String qName,

Attributes atts) throws SAXException {

System.out.println(this.toBlankString(this.frontBlankCount++)+

">> ;> start element : "+qName+"("+uri+")");

}


/*

* Start prefix URI namespace scope mapping.

* The information for this event is not necessary for normal namespace processing:

* When the http://xml.org/sax/features/namespaces feature is true (default),

* The SAX XML reader will automatically replace prefixes in element and attribute names.

* Parameter meanings are as follows:

* prefix: prefix

* uri: namespace

*/

@Override

public void startPrefixMapping(String prefix,String uri)

throws SAXException {

System.out.println(this.toBlankString(this.frontBlankCount++)+

">>> start prefix_mapping : xmlns:"+prefix+" = "

+"\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\""+uri+"\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\"");


}


private String toBlankString(int count ){

StringBuffer buffer = new StringBuffer();

for(int i = 0;i

buffer.append(" ");

return buffer.toString();

}


}'



2, DDTHandler interface: processor interface that receives notifications of DTD-related events

Java code collection code

'import org.xml.sax.DTDHandler;

import org.xml.sax.SAXException;


##public class MyDTDHandler implements DTDHandler {


/*

* Receive notification of annotation declaration events.

* Parameter meaning is as follows:

* name - annotation name.

* publicId - annotation. The public identifier, or null if not provided.

* systemId - The system identifier of the annotation, or null if not provided.

*/

@Override

public void notationDecl(String name, String publicId, String systemId)

throws SAXException {

System.out.println(">>> notation declare : (name = "+name

+",systemId = "+publicId

+",publicId = "+systemId+ ")");

}


/*

* Receive notification of unresolved entity declaration events.

* The meaning of the parameters is as follows:

* name - the name of the unresolved entity.

* publicId - The public identifier of the entity, or null if not provided.

* systemId - The system identifier of the entity.

* notationName - The name of the related annotation.

*/

@Override

public void unparsedEntityDecl(String name,

String publicId,

String systemId,

String notationName) throws SAXException {

System.out.println(">>> unparsed entity declare : (name = "+name

+",systemId = " +publicId

+",publicId = "+systemId

+",notationName = "+notationName+")");

}


}'



3, EntityResolver interface: It is the basic interface used to parse entities.

Java Code Collection Code

'import java.io.IOException;


##import org.xml.sax.EntityResolver;

import org.xml.sax.InputSource;

import org.xml.sax.SAXException;

##public class MyEntityResolver implements EntityResolver {

/*

* Allows the application to resolve external entities.

* The parser will call this method before opening any external entity (except the top-level document entity)

* The parameter meanings are as follows:

* publicId: the referenced external entity The public identifier, or null if not provided.

* systemId: The system identifier of the referenced external entity.

* Returns:

* An InputSource object describing the new input source, or null,

* to request the parser to open a regular URI connection to the system identifier.

*/

@Override

public InputSource resolveEntity(String publicId, String systemId)

throws SAXException, IOException {

return null;

}


}


##4, ErrorHandler interface: is the error handler Basic interface.

Java Code Collection Code

import org.xml.sax.ErrorHandler;

import org.xml.sax.SAXException;

import org.xml .sax.SAXParseException;


##public class MyErrorHandler implements ErrorHandler {


/*

* Receive notification of recoverable errors

*/

@Override

public void error(SAXParseException e) throws SAXException {

System.err.println ("Error ("+e.getLineNumber()+","

+e.getColumnNumber()+") : "+e.getMessage());

}


/*

* Receive notification of unrecoverable errors.

*/

@Override

public void fatalError(SAXParseException e) throws SAXException {

System.err.println("FatalError ("+e .getLineNumber()+","

+e.getColumnNumber()+") : "+e.getMessage());

}


/*

* Receive notification of unrecoverable errors.

*/

@Override

public void warning(SAXParseException e) throws SAXException {

System.err.println("Warning ("+e .getLineNumber()+","

+e.getColumnNumber()+") : "+e.getMessage());

}


}



The main method of the Test class prints event information when parsing books.xml.

Java Code Collection Code

import java.io.FileNotFoundException;

import java.io.FileReader;

import java.io.IOException;


import org.xml.sax.ContentHandler;

import org.xml.sax.DTDHandler;

import org.xml.sax .EntityResolver;

import org.xml.sax.ErrorHandler;

import org.xml.sax.InputSource;

import org.xml.sax.SAXException;

import org.xml.sax.XMLReader;

import org.xml.sax.helpers.XMLReaderFactory;


##public class Test {

public static void main(String[] args) throws SAXException,

FileNotFoundException, IOException {

//Create a handler for handling document content-related events

ContentHandler contentHandler = new MyContentHandler();

//Create a handler for handling error events

ErrorHandler errorHandler = new MyErrorHandler();

//Create a handler that handles DTD-related events

DTDHandler dtdHandler = new MyDTDHandler();

//Create an entity parser

EntityResolver entityResolver = new MyEntityResolver();

//Create an XML parser (read and parse XML through SAX)

XMLReader reader = XMLReaderFactory.createXMLReader();

/*

* Set the related features of the parser

* http://xml.org/sax/features/validation = true Indicates that the verification feature is enabled

* http://xml.org/sax/features/namespaces = true indicates that the namespace feature is enabled

*/

reader.setFeature(" http://xml.org/sax/features/validation",true);

reader.setFeature("http://xml.org/sax/features/namespaces",true);

//Set the XML parser's handler for handling document content-related events

reader.setContentHandler(contentHandler);

//Set the XML parser's handler for handling error events

reader.setErrorHandler(errorHandler);

//Set the handler of the XML parser to handle DTD related events

reader.setDTDHandler(dtdHandler);

//Set the entity parser of the XML parser

reader.setEntityResolver(entityResolver);

//Parse the books.xml document

reader.parse(new InputSource( new FileReader("books.xml")));

}

##}'



The contents of the books.xml file are as follows:


Xml code Collection code







##Thinking in JAVA



##Core JAVA2



C++ primer



##The console output is as follows:


##>>> set document_locator: (lineNumber = 1, columnNumber = 1, systemId = null, publicId = null)

>>> start document

Error (2,7) : Document is invalid: no grammar found.


Error (2,7) : Document root element "books", must match DOCTYPE root "null".

>>> start prefix_mapping : xmlns: = "http://test.org/books"

> >> start element : books(http://test.org/books)

>>> characters(2): \\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\t

>>> characters(2): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t

>>> start element : book(http://test.org/books)

>>> characters(3): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t

>>> start element : name(http://test.org/books)

>>> characters(16): Thinking in JAVA

>>> end element : name(http://test.org/books)

>>> characters(2): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t

>>> end element : book(http://test.org/books)

>>> characters(2): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t

>>> start element : book(http://test.org/books)

>>> characters(3): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t

>>> start element : name(http://test.org/books)

>>> characters(10): Core JAVA2

>>> end element : name(http://test.org/books)

>>> characters(2): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t

>>> end element : book(http://test.org/books)

>>> characters(2): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t

>>> start element : book(http://test.org/books)

>>> characters(3): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t

>>> start element : name(http://test.org/books)

>>> characters(10): C++ primer

>>> end element : name(http://test.org/books)

>>> characters(2): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\t

>>> end element : book(http://test.org/books)

>>> characters(1): \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n

>>> end element : books(http://test.org/books)

>>> end prefix_mapping :

>>> end document  

The above is the detailed content of Detailed explanation of XML SAX parsing. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:XML parsing in AndroidNext article:XML parsing in Android