Home  >  Article  >  Backend Development  >  Detailed explanation of Android's implementation of XML parsing technology (picture)

Detailed explanation of Android's implementation of XML parsing technology (picture)

黄舟
黄舟Original
2017-03-09 17:07:061500browse

This article introduces three ways to parse XML on the Android platform.

XML is widely used in various developments, and Android is no exception. As an important role in carrying data, how to read and write XML has become an important skill in Android development.

In Android, the common XML parsers are DOM parser, SAX parser and PULL parser. Below, I will introduce them to you in detail one by one.

First way: DOM parser:

DOM is A collection of nodes or information fragments based on a tree structure that allows developers to use the DOM API to traverse the XML tree and retrieve the required data. Analyzing this structure typically requires loading the entire document and constructing a tree structure before node information can be retrieved and updated. Android fully supports DOM parsing. Using objects in the DOM, XML documents can be read, searched, modified, added, and deleted.

The working principle of DOM: When using DOM to operate XML files, you must first parse the file and divide the file into independent elements, attributes and comments, etc., and then represent the XML file in memory in the form of a node tree. You can access the content of the document through the node tree and modify the document as needed - this is how the DOM works.

DOM implementation first defines a set of interfaces for parsing XML documents, the parser reads the entire document, and then constructs a memory-resident tree structure, so The code can then use the DOM interface to manipulate the entire tree structure.

Since the DOM is stored in a tree structure in memory, retrieval and update efficiency will be higher. But for particularly large documents, parsing and loading the entire document will be resource-intensive. Of course, if the content of the XML file is relatively small, it is feasible to use DOM.

Commonly used DoM interfaces and classes:

Document: This interface defines a series of analysis and creation of DOM documents Method, which is the root of the document tree and the basis for operating the DOM.

Element: This interface inherits the Node interface and provides methods for obtaining and modifying XML element names and attributes.


Node: This interface provides methods to process and obtain node and child node values.

NodeList: Provides methods to obtain the number of nodes and the current node. This allows individual nodes to be accessed iteratively.

DOMParser: This class is the DOM parser class in Apache's Xerces, which can directly parse XML files.

The following is the DOM parsing process:

Second way: SAX parser:

The SAX (Simple API for XML) parser is an event-based The event-driven streaming parsing method of the parser is to parse sequentially from the beginning of the file to the end of the document without pausing or rewinding. Its core is the event processing model, which mainly works around event sources and event processors. When the event source generates an event, call the corresponding processing method of the event processor, and an event can be processed. When the event source calls a specific method in the event handler, it must also pass the status information of the corresponding event to the event handler, so that the event handler can decide its own behavior based on the provided event information.

The advantage of the SAX parser is that it has fast parsing speed and takes up less memory. Perfect for use in Android mobile devices.

The working principle of SAX: The working principle of SAX is simply to scan the document sequentially. When the start and end of the document (document) and the element ( The event processing function is notified when the element starts and ends, the document ends, etc., and the event processing function takes corresponding actions, and then continues the same scan until the end of the document.

In the SAX interface, the event source is the XMLReader in the org.xml.sax package, which parses the XML document through the parser() method. and generate events. The event handler is the four interfaces ContentHander, DTDHander, ErrorHandler, and EntityResolver in the org.xml.sax package. XMLReader completes the connection with the four interfaces ContentHander, DTDHander, ErrorHandler, and EntityResolver through the corresponding event handler registration method setXXXX().

Commonly used SAX interfaces and classes:

Attrbutes: used Get the number, name and value of attributes.

ContentHandler: Defines events associated with the document itself (e.g., opening and closing tags). Most applications register for these events.

DTDHandler: Defines events associated with DTD. It does not define enough events to report the DTD completely. If parsing of the DTD is required, use the optional DeclHandler.

DeclHandler is an extension of SAX. Not all parsers support it.

EntityResolver: Defines events associated with loading entities. Only a few applications register for these events.

ErrorHandler: Define error events. Many applications register these events to report errors in their own way.

DefaultHandler: It provides the default implementation of these interfaces. In most cases, it is easier for an application to extend DefaultHandler and override the relevant methods than to implement an interface directly.

See the table below for details:

## It can be seen that we need XmlReader and DefaultHandler to parse xml.

The following is the SAX parsing process:

The third way: PULL parser:

Android does not Provides support for Java StAX API. However, Android comes with a pull parser that works similar to StAX. It allows the user's application code to get events from the parser, as opposed to the SAX parser automatically pushing events into handlers.

The PULL parser operates similarly to SAX, both are event-based. The difference is that numbers are returned during the PULL parsing process, and we need to obtain the generated events ourselves and then perform corresponding operations, unlike SAX where the processor triggers an event and executes our code.

The following is the process of PULL parsing XML:

Reading the xml declaration returns START_DOCUMENT;


Read the end of xml and return END_DOCUMENT;


##Read the start tag of xml and return START_TAG


Read the end tag of xml and return END_TAG


##Read the text of xml and return TEXT


PULL parser is small and lightweight, has fast parsing speed, is simple and easy to use, and is very suitable for use in Android mobile devices. The Android system is parsing internally PULL parser is also used for various XML. Android officially recommends developers to use Pull parsing technology. Pull parsing technology is an open source technology developed by a third party, and it can also be applied to JavaSE development.

How PULL works: XML pull provides a start element and an end element. When an element starts, we can call parser. nextText extracts all character data from the XML document. When the interpretation of a document ends, the EndDocument event is automatically generated.

Commonly used XML pull interfaces and classes:

XmlPullParser: XML pull parser is a definition parsing function provided in XMLPULL VlAP1 Interface.

XmlSerializer: It is an interface that defines the sequence of XML information sets.

XmlPullParserFactory: This class is used to create XML Pull parsers in the XMPULL V1 API.

XmlPullParserException: Throws a single XML pull parser related error.

The parsing process of PULL is as follows:

[Additional] The fourth way: Android.util.Xml class


## in Android In the API, Android is also provided. util. The Xml class can also parse XML files. The usage method is similar to SAX. You also need to write a Handler to handle XML parsing, but it is simpler to use than SAX, as shown below:

With android. util. XML implements XML parsing,

MyHandler myHandler=new MyHandler0;

android. util. Xm1. parse(ur1.openC0nnection().getlnputStream0, Xm1.Encoding.UTF-8, myHandler);

The following is a reference document

river.xml, placed in the assets directory .as follows:

<?xml version="1.0" encoding="utf-8"?> 
<rivers> <river name="灵渠" length="605">     <introduction>


## The Lingqu Canal is located in Xing'an County, Guangxi Zhuang Autonomous Region. It is one of the oldest canals in the world and has the reputation of "the Pearl of Ancient Water Conservancy Architecture in the World". In ancient times, Lingqu was known as Qin Zhuoqu, Lingqu, Douhe and Xing'an Canal. It was built and opened to navigation in 214 BC. It is still functioning 2217 years ago.

  
</
introduction
>
      
<
imageurl
>
      http://www.php.cn/
     
</
imageurl
>
   
</
river
>
 
   
   
<
river 
name
="胶莱运河"
 length
="200"
>
     
<
introduction
>


      胶莱运河南起黄海灵山海口,北抵渤海三山岛,流经现胶南、胶州、平度、高密、昌邑和莱州等,全长200公里,流域面积达5400平方公里,南北贯穿山东半岛,沟通黄渤两海。胶莱运河自平度姚家村东的分水岭南北分流。南流由麻湾口入胶州湾,为南胶莱河,长30公里。北流由海仓口入莱州湾,为北胶莱河,长100余公里。
     

</
introduction
>
      
<
imageurl
>
      http://www.php.cn/
     
</
imageurl
>
   
</
river
>
   
   
<
river 
name
="苏北灌溉总渠"
 length
="168"
>
 
     
<
introduction
>


      位于淮河下游江苏省北部,西起洪泽湖边的高良涧,流经洪泽,青浦、淮安,阜宁、射阳,滨海等六县(区),东至扁担港口入海的大型人工河道。全长168km。
    

 
</
introduction
>
      
<
imageurl
>
      http://www.php.cn/
     
</
imageurl
>
   
</
river
>
 
</
rivers
>


  


      采用DOM解析时具体处理步骤是:

首先利用DocumentBuilderFactory创建一个DocumentBuilderFactory实例
然后利用DocumentBuilderFactory创建DocumentBuilder

然后加载XML文档(Document,
然后获取文档的根结点(Element)
5
然后获取根结点中所有子节点的列表(NodeList),
6
然后使用再获取子节点列表中的需要读取的结点。

 

  当然我们观察节点,我需要用一个River对象来保存数据,抽象出River 


public class River implements Serializable {     
   privatestaticfinallong serialVersionUID = 1L;     
   private String name;    
   public String getName() {        
   return name;    }    
   public void setName(String name) {        
   this.name = name;    }    
   public int getLength() {        
   return length;    }    
   public void setLength(int length) {        
   this.length = length;    }    
   public String getIntroduction() {        
   return introduction;    }    
   public void setIntroduction(String introduction) {        
   this.introduction = introduction;    }    
   public String getImageurl() {        
   return imageurl;    }    
   public void setImageurl(String imageurl) {        
   this.imageurl = imageurl;    }    
   private int length;    
   private String introduction;    
   private String imageurl; }

下面我们就开始读取xml文档对象,并添加进List中:

代码如下: 我们这里是使用assets中的river.xml文件,那么就需要读取这个xml文件,返回输入流。 读取方法为:inputStream=this.context.getResources().getAssets().open(fileName); 参数是xml文件路径,当然默认的是assets目录为根目录。

 

然后可以用DocumentBuilder对象的parse方法解析输入流,并返回document对象,然后再遍历doument对象的节点属性。
  

//获取全部河流数据 


   /**   
     * 参数fileName:为xml文档路径

     */

    public List<River> getRiversFromXml(String fileName){

        List<River> rivers=new ArrayList<River>();

        DocumentBuilderFactory factory=null;

        DocumentBuilder builder=null;

        Document document=null;

        InputStream inputStream=null;

        //首先找到xml文件

        factory=DocumentBuilderFactory.newInstance();

        try {

            //找到xml,并加载文档

            builder=factory.newDocumentBuilder();

            inputStream=this.context.getResources().getAssets().open(fileName);

            document=builder.parse(inputStream);

            //找到根Element

             Element root=document.getDocumentElement();

             NodeList nodes=root.getElementsByTagName(RIVER);

            //遍历根节点所有子节点,rivers 下所有river

             River river=null;

             for(int i=0;i<nodes.getLength();i++){

                     river=new River(); 

                     //获取river元素节点

                     Element riverElement=(Element)(nodes.item(i));

                     //获取river中name属性值

                     river.setName(riverElement.getAttribute(NAME));

                     river.setLength(Integer.parseInt(riverElement.getAttribute(LENGTH)));

                     //获取river下introduction标签

                     Element introduction=(Element)riverElement.getElementsByTagName(INTRODUCTION).item(0);

                     river.setIntroduction(introduction.getFirstChild().getNodeValue());

                     Element imageUrl=(Element)riverElement.getElementsByTagName(IMAGEURL).item(0);

                     river.setImageurl(imageUrl.getFirstChild().getNodeValue()); 

                 rivers.add(river);

             }

        }catch (IOException e){

            e.printStackTrace();

        } catch (SAXException e) {

            e.printStackTrace();

        }

         catch (ParserConfigurationException e) {

            e.printStackTrace();

        }finally{

            try {

                inputStream.close();

            } catch (IOException e) {    

                e.printStackTrace();

            }

        }

        return rivers;

    }

在这里添加到List中, 然后我们使用ListView将他们显示出来。如图所示: 

 


  采用SAX解析时具体处理步骤是:

1 创建SAXParserFactory对象

2 根据SAXParserFactory.newSAXParser()方法返回一个SAXParser解析器

3 根据SAXParser解析器获取事件源对象XMLReader

4 实例化一个DefaultHandler对象

5 连接事件源对象XMLReader到事件处理类DefaultHandler中

6 调用XMLReader的parse方法从输入源中获取到的xml数据

7 通过DefaultHandler返回我们需要的数据集合。

 

代码如下:


public List<River> parse(String xmlPath){
	List<River> rivers=null;

        SAXParserFactory factory=SAXParserFactory.newInstance();

        try {

            SAXParser parser=factory.newSAXParser();

            //获取事件源

            XMLReader xmlReader=parser.getXMLReader();

            //设置处理器

            RiverHandler handler=new RiverHandler();

            xmlReader.setContentHandler(handler);

            //解析xml文档

            //xmlReader.parse(new InputSource(new URL(xmlPath).openStream()));

            xmlReader.parse(new InputSource(this.context.getAssets().open(xmlPath)));

            rivers=handler.getRivers();    

        } catch (ParserConfigurationException e) {

            // TODO Auto-generated catch block

            e.printStackTrace();

        } catch (SAXException e) {

            // TODO Auto-generated catch block

            e.printStackTrace();

        } catch (IOException e) {

            e.printStackTrace();

        }

        

        return rivers;

    }

 

重点在于DefaultHandler对象中对每一个元素节点,属性,文本内容,文档内容进行处理。

 

前面说过DefaultHandler是基于事件处理模型的,基本处理方式是:当SAX解析器导航到文档开始标签时回调startDocument方法,导航到文档结束标签时回调endDocument方法。当SAX解析器导航到元素开始标签时回调startElement方法,导航到其文本内容时回调characters方法,导航到标签结束时回调endElement方法。

 

根据以上的解释,我们可以得出以下处理xml文档逻辑:

1:当导航到文档开始标签时,在回调函数startDocument中,可以不做处理,当然你可以验证下UTF-8等等。

2:当导航到rivers开始标签时,在回调方法startElement中可以实例化一个集合用来存贮list,不过我们这里不用,因为在构造函数中已经实例化了。

3:导航到river开始标签时,就说明需要实例化River对象了,当然river标签中还有name ,length属性,因此实例化River后还必须取出属性值,attributes.getValue(NAME),同时赋予river对象中,同时添加为导航到的river标签添加一个boolean为真的标识,用来说明导航到了river元素。

4:当然有river标签内还有子标签(节点),但是SAX解析器是不知道导航到什么标签的,它只懂得开始,结束而已。那么如何让它认得我们的各个标签呢?当然需要判断了,于是可以使用回调方法startElement中的参数String localName,把我们的标签字符串与这个参数比较下,就可以了。我们还必须让SAX知道,现在导航到的是某个标签,因此添加一个true属性让SAX解析器知道。

5:它还会导航到文本内标签,(就是a1f02c36ba31691bcfe87b2722de723ba376092e9406724d5c271fcc648ed25a里面的内容),回调方法characters,我们一般在这个方法中取出就是a1f02c36ba31691bcfe87b2722de723ba376092e9406724d5c271fcc648ed25a里面的内容,并保存。 6:当然它是一定会导航到结束标签130b130fec2d2f95e194298b0c54b01d 或者32597ed33a62307b98b0a630505272be的,如果是130b130fec2d2f95e194298b0c54b01d标签,记得把river对象添加进list中。如果是river中的子标签7244917fde7c9800d42f9b80557c07c0,就把前面设置标记导航到这个标签的boolean标记设置为false. 按照以上实现思路,可以实现如下代码:

/**导航到开始标签触发**/
        publicvoid startElement (String uri, String localName, String qName, Attributes attributes){ 
         String tagName=localName.length()!=0?localName:qName;
         tagName=tagName.toLowerCase().trim();
         //如果读取的是river标签开始,则实例化River
         if(tagName.equals(RIVER)){
             isRiver=true;
             river=new River();
                /**导航到river开始节点后**/
                river.setName(attributes.getValue(NAME));
                river.setLength(Integer.parseInt(attributes.getValue(LENGTH)));
         }
         //然后读取其他节点
          if(isRiver){ 
              if(tagName.equals(INTRODUCTION)){
                 xintroduction=true;
             }else if(tagName.equals(IMAGEURL)){
                 ximageurl=true;
             }  
         }  
        }
        
        /**导航到结束标签触发**/
        public void endElement (String uri, String localName, String qName){
         String tagName=localName.length()!=0?localName:qName;
         tagName=tagName.toLowerCase().trim();
         
        //如果读取的是river标签结束,则把River添加进集合中
         if(tagName.equals(RIVER)){
             isRiver=true;
             rivers.add(river);
         }
         //然后读取其他节点
          if(isRiver){ 
              if(tagName.equals(INTRODUCTION)){
                 xintroduction=false;
             }else if(tagName.equals(IMAGEURL)){
                 ximageurl=false;
             } 
          }   
        } 
        
        //这里是读取到节点内容时候回调
        public void characters (char[] ch, int start, int length){
            //设置属性值
                if(xintroduction){
                     //解决null问题
                     river.setIntroduction(river.getIntroduction()==null?"":river.getIntroduction()+new String(ch,start,length));
                 }else if(ximageurl){
                     //解决null问题
                     river.setImageurl(river.getImageurl()==null?"":river.getImageurl()+new String(ch,start,length));
                 }    
        }


 

运行效果跟上例DOM 运行效果相同。

 

采用PULL解析基本处理方式:

当PULL解析器导航到文档开始标签时就开始实例化list集合用来存贮数据对象。导航到元素开始标签时回判断元素标签类型,如果是river标签,则需要实例化River对象了,如果是其他类型,则取得该标签内容并赋予River对象。当然它也会导航到文本标签,不过在这里,我们可以不用。

 

根据以上的解释,我们可以得出以下处理xml文档逻辑:

1:当导航到XmlPullParser.START_DOCUMENT,可以不做处理,当然你可以实例化集合对象等等。

2:当导航到XmlPullParser.START_TAG,则判断是否是river标签,如果是,则实例化river对象,并调用getAttributeValue方法获取标签中属性值。

3:当导航到其他标签,比如Introduction时候,则判断river对象是否为空,如不为空,则取出Introduction中的内容,nextText方法来获取文本节点内容

4:当然啦,它一定会导航到XmlPullParser.END_TAG的,有开始就要有结束嘛。在这里我们就需要判读是否是river结束标签,如果是,则把river对象存进list集合中了,并设置river对象为null.

由以上的处理逻辑,我们可以得出以下代码:


  public List<River> parse(String xmlPath){
	List<River> rivers=new ArrayList<River>();
        River river=null;
        InputStream inputStream=null;    
        //获得XmlPullParser解析器
        XmlPullParser xmlParser = Xml.newPullParser();   
        try {
            //得到文件流,并设置编码方式
            inputStream=this.context.getResources().getAssets().open(xmlPath);
            xmlParser.setInput(inputStream, "utf-8");
            //获得解析到的事件类别,这里有开始文档,结束文档,开始标签,结束标签,文本等等事件。
            int evtType=xmlParser.getEventType();
         //一直循环,直到文档结束    
         while(evtType!=XmlPullParser.END_DOCUMENT){ 
            switch(evtType){ 
            case XmlPullParser.START_TAG:
                String tag = xmlParser.getName(); 
                //如果是river标签开始,则说明需要实例化对象了
                if (tag.equalsIgnoreCase(RIVER)) { 
                   river = new River(); 
                  //取出river标签中的一些属性值
                  river.setName(xmlParser.getAttributeValue(null, NAME));
                  river.setLength(Integer.parseInt(xmlParser.getAttributeValue(null, LENGTH)));
                }else if(river!=null){
                    //如果遇到introduction标签,则读取它内容
                    if(tag.equalsIgnoreCase(INTRODUCTION)){
                    river.setIntroduction(xmlParser.nextText());
                    }else if(tag.equalsIgnoreCase(IMAGEURL)){
                        river.setImageurl(xmlParser.nextText());
                    }
                }
                break;
                
           case XmlPullParser.END_TAG:
             //如果遇到river标签结束,则把river对象添加进集合中
               if (xmlParser.getName().equalsIgnoreCase(RIVER) && river != null) { 
                   rivers.add(river); 
                   river = null; 
               }
                break; 
                default:break;
            }
            //如果xml没有结束,则导航到下一个river节点
            evtType=xmlParser.next();
         }
        } catch (XmlPullParserException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } 
        return rivers; 
 }


运行效果和上面的一样。

 

几种解析技术的比较与总结:  

对于Android的移动设备而言,因为设备的资源比较宝贵,内存是有限的,所以我们需要选择适合的技术来解析XML,这样有利于提高访问的速度。

1 DOM在处理XML文件时,将XML文件解析成树状结构并放入内存中进行处理。当XML文件较小时,我们可以选DOM,因为它简单、直观。

2 SAX则是以事件作为解析XML文件的模式,它将XML文件转化成一系列的事件,由不同的事件处理器来决定如何处理。XML文件较大时,选择SAX技术是比较合理的。虽然代码量有些大,但是它不需要将所有的XML文件加载到内存中。这样对于有限的Android内存更有效,而且Android提供了一种传统的SAX使用方法以及一个便捷的SAX包装器。 使用Android.util.Xml类,从示例中可以看出,会比使用 SAX来得简单。

3 XML pull解析并未像SAX解析那样监听元素的结束,而是在开始处完成了大部分处理。这有利于提早读取XML文件,可以极大的减少解析时间,这种优化对于连接速度较漫的移动设备而言尤为重要。对于XML文档较大但只需要文档的一部分时,XML Pull解析器则是更为有效的方法。

 



The above is the detailed content of Detailed explanation of Android's implementation of XML parsing technology (picture). For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn