Home >Backend Development >XML/RSS Tutorial >Web Programming-Detailed Explanation of XML Grammar Analysis

Web Programming-Detailed Explanation of XML Grammar Analysis

黄舟
黄舟Original
2017-03-24 16:47:351531browse

Before performing XML grammatical analysis, it is first necessary to understand the basic rules of XML syntax:

Lexical features: 1) XML is case-sensitive, such as element names in opening and closing tags The upper and lower case should be consistent 1b01232ea6f0577bc4ec8d1a522b6a86…6879c8de8c5e2889d23c06f516d46b6b, and the reserved word strings of XML should meet the upper and lower case requirements 40eed24cbc9faf0c347d89425ab61f30 8f52c3d5f1e09e976c814aa1b0a986bd….

 2) XML reserved mark characters are: 6d267e5fab17ea8bc578f9e7e5e1570b &, reserved words The symbol is not allowed to appear in element names, element text, attribute names, and attribute values. 365bd3cfff704c2d9ab86a7729d121bc is used to close the tag, & is used to change the meaning. The common meaning is <generated718960495943a27c0c1fa06f067aa06f, &Generate&, &aposGenerate', "Generate”

 3) The element name starts with an underscore or letter and can contain letters, numbers, periods, hyphens, underscores, colons and other Extended characters of the language. There cannot be spaces (separators, tabs, line feeds, carriage returns) in element names. Element names can be prefixed by name fields. For example: 1b01232ea6f0577bc4ec8d1a522b6a86 a376291d94db2e2f2a1f979ad148cf2a The element text can be a set of characters except XML reserved characters, such as 1b01232ea6f0577bc4ec8d1a522b6a86 my money is $2000 6879c8de8c5e2889d23c06f516d46b6b

 4) Attribute name The rule is the same as the element name, and the attribute value is enclosed by single quotes or double quotes, and can be composed of strings other than XML reserved characters, such as: b90ebca9dbe7a56cd77974d78d91ecc2. The attribute name has the xmlns prefix, indicating that the attribute defines a name domain, such as: 19219245428dfee12fc709cb849a2e8f

Syntactic features: 1) An XML document consists of an XML description, multiple optional Document Description, multiple optional XML directives, multiple optional XML comments and a data body of the root element. In addition, there can be embedded The CDATA segment in the statement, such as:

<?xml …?> /*XML说明*/
  <!DOCTYPE …> /*XML文档说明*/
  <!-- … --> /*XML注释*/
  <?xml-stylesheet …?> /*XML指令*/
  <root> /*根数据元素*/
  <child>
  …<![CDATA[…]]>
  </child>
  </root>

2) The XML description is opened by 7b96d846f1aab592b2c8486d3cbe1c15 mark, which contains optional descriptions such as version and encoding, such as: 90ce895db4f92773c7fa25b89af2b905
3) XML document description is opened by 65efd9680ae6ff54cc344c5ea8b05f85, such as: cdb2904a093981ac5d17e2a4ea60a6dc, such as: e129ac797af2463f2cbe789c97735130
5) XML comments are opened by 9f996e53ad5591b11d5373879fa32ef0, such as: 9ef2872f30659e7a461b61bee7f4943e
6) XML elements are opened by 1a7d03b1a7c2b971f547d29f7a08c8c1 Open, closed by />, or 68a56ae7059556683f46528c5f5d9bdc, the opening and closing tags of the element match each other, such as bf5181c8226ee8e19fd67e3f913eb780 or 1b01232ea6f0577bc4ec8d1a522b6a86…178733e97ec71b870a77daafc67e8e0b, XML element Nesting is allowed, and hierarchical matching should also be maintained, such as 5c49c926bc487be6f96606df447708f4e663bf67d432875fde4c932efd4feaf7..48f80ac41c3449a2c0f73671e2cd7ccd6879c8de8c5e2889d23c06f516d46b6b.
 7) The CDTATA segment is opened by 4bef20c9f836c1fb85f150a06f5b108c and closed by ]]>, which is used to make the statements in it avoid XML parsing rules. For example: 093148c15d63929c640da2944ff08635
Based on the above XML grammatical features, regular expressions for lexical analysis and syntactic analysis can be constructed Pushdown automaton structure.
XML lexical regular expression:
#define digit [1,2,…,9] /*Number character*/
#define letter [a,b,…,z,A,B,…, Z] /*Alphabetic characters*/
 #define signs [~, ! , @, #, %, ^, &,*,(, ), ?, :, ;, “, ', ,, ., / ,-, _, +, =, |, /] /*Symbol character*/
 #define ascii2 [0x80,…,0xFF] /*ASCII chart2 extended character*/
 #define space [0x20, / t, /r, /n] /*Space character, tab character, carriage return character, line feed character*/
 #define reserve [95ec6993dc754240360e28e0de8de30a, &] /*XML reserved characters*/
1) The regular expression of the element name:

  element_name -> (_ | letter | ascii2) (ε| _ | - | : | . | digit | letter | signs | ascii2)*

2) The regular expression of the element text:

  element_text -> (ε| not reserve)*

3) The regular expression of the attribute name:

  proper_name -> (_ | letter | ascii2) (ε| _ | - | : | . | digit | letter | signs | ascii2)*

 4) Regular expression of attribute text:

  proper_value -> (ε| not reserve)*

XML syntax structure:

 xml_document -> xml_header (ε| xml_declare | xml_instruct | xml_comments)* xml_element
  xml_header -> [<?xml](space)*(proper_token)*(space)* [?>]
  xml_declare -> [<!]reserve_word(space)*(token)*(space)*[>]
  xml_instruct -> [<?]reserve_word(space)* (proper_token)* (space)*[?>]
  xml_comments -> [<!--](ε| digit | letter | signs | ascii2 | space)*[-- >]
  xml_element -> [<]element_name (space)*( ε| proper_token)*(space)*[/>] | 
  [<]element_name(space)*( ε | proper_token)*(space)*[>]
  [ε| <![CDATA[ ]element_text[ε| ]]>]
  (ε | xml_element)*(space)*[</]element_name[>]
  proper_token -> proper_name(space)*[=](space)* [ε| <![CDATA[ ] [‘ | “]proper_value[‘ | “] [ε| ]]>]
  reserve_word -> [DOCTYPE | ELEMENT | NOTATION | …]
  token -> (ε| not reserve)*

Analyzing XML grammar requires constructing a pushdown automaton, its structure The definition is as follows:

 1) STACK_DFA mata_xml_doc = c2c00a4f81bdb39f559c7014143f40dc

 Q: {…} /*详见后面的状态集合*/
  Σ: /*指向待解析的XML元素词串*/
  σ: Q×Σ->Q /*状态转移函数,见状态转移列表*/
  q: {NIL_SKIP} /*初始状态*/
  Γ: {NIL_FAILED,NIL_SUCCEED} /*终结状态集合*/
  S:  {Q/*状态*/, N/*DOM节点*/>,<…>} /*下推栈*/

 2) The stack top symbol set is used to reflect Type of current analysis node:

T:{NIL/*空*/, TG/*标记*/, NS/*元素*/, IS/*指令*/, DS/*声明*/, CD/*CDATA界段*/,CM/*注释*/}

 3) The status set reflects the characteristics of a certain stage of analysis, corresponding to the top symbol of the stack:
 

 NIL:  NIL_FAILED /*失败*/
  NIL_SKIP /*忽略*/
  NIL_SUCCEED /*成功*/
  CM:  CM_BEGIN /*注释开始*/
  CM_END /*注释结束*/
  TG:  TG_OPEN /*标记打开*/
  TG_INT_CLOSE /*标记中断*/
  TG_PRE_CLOSE /*标记准备关闭*/
  TG_CLOSE /*标记关闭*/
  NS:  NS_NAME_BEGIN /*元素名开始*/
  NS_NAME_END /*元素名结束*/
  NS_KEY_BEGIN /*属性名开始*/
  NS_KEY_END /*属性名结束*/
  NS_ASIGN /*属性赋值*/
  NS_VAL_BEGIN /*属性值开始*/
  NS_VAL_END /*属性值结束*/
  NS_TEXT_BEGIN /*元素文本开始*/
  NS_TEXT_END /*元素文本结束*/
  IS:  IS_OPEN /*指令打开*/
  IS_NAME_BEGIN /*指令名开始*/
  IS_NAME_END /*指令名结束*/
  IS_KEY_BEGIN /*指令键开始*/
  IS_KEY_END /*指令键结束*/
  IS_ASIGN /*赋值符*/
  IS_VAL_BEGIN /*指令值开始*/
  IS_VAL_END /*指令值结束*/
  IS_CLOSE /*指令关闭*/
  DS:  DS_OPEN /*声明打开*/
  DS_SKIP /*越过申明节*/
  DS_CLOSE /*声明关闭*/
  CD:  CD_BEGIN /*CDATA界段开始*/
  CD_END /*CDATA界段结束*/

The above is the detailed content of Web Programming-Detailed Explanation of XML Grammar Analysis. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn