Before performing XML grammatical analysis, it is first necessary to understand the basic rules of XML syntax:
Lexical features: 1) XML is case-sensitive, such as element names in opening and closing tags The upper and lower case should be consistent
2) XML reserved mark characters are: &, reserved words The symbol is not allowed to appear in element names, element text, attribute names, and attribute values. is used to close the tag, & is used to change the meaning. The common meaning is <generated, &Generate&, &aposGenerate', "Generate”
3) The element name starts with an underscore or letter and can contain letters, numbers, periods, hyphens, underscores, colons and other Extended characters of the language. There cannot be spaces (separators, tabs, line feeds, carriage returns) in element names. Element names can be prefixed by name fields. For example:
4) Attribute name The rule is the same as the element name, and the attribute value is enclosed by single quotes or double quotes, and can be composed of strings other than XML reserved characters, such as:
Syntactic features: 1) An XML document consists of an XML description, multiple optional Document Description, multiple optional XML directives, multiple optional XML comments and a data body of the root element. In addition, there can be embedded The CDATA segment in the statement, such as:
<?xml …?> /*XML说明*/ <!DOCTYPE …> /*XML文档说明*/ <!-- … --> /*XML注释*/ <?xml-stylesheet …?> /*XML指令*/ <root> /*根数据元素*/ <child> …<![CDATA[…]]> </child> </root>
2) The XML description is opened by mark, which contains optional descriptions such as version and encoding, such as:
3) XML document description is opened by , such as: mydoc.dtd”> ;
4) XML instructions are opened by and reserved strings, and closed by ?>, such as:
5) XML comments are opened by , such as:
6) XML elements are opened by
7) The CDTATA segment is opened by and closed by ]]>, which is used to make the statements in it avoid XML parsing rules. For example:
Based on the above XML grammatical features, regular expressions for lexical analysis and syntactic analysis can be constructed Pushdown automaton structure.
XML lexical regular expression:
#define digit [1,2,…,9] /*Number character*/
#define letter [a,b,…,z,A,B,…, Z] /*Alphabetic characters*/
#define signs [~, ! , @, #, %, ^, &,*,(, ), ?, :, ;, “, ', ,, ., / ,-, _, +, =, |, /] /*Symbol character*/
#define ascii2 [0x80,…,0xFF] /*ASCII chart2 extended character*/
#define space [0x20, / t, /r, /n] /*Space character, tab character, carriage return character, line feed character*/
#define reserve [, &] /*XML reserved characters*/
1) The regular expression of the element name:
element_name -> (_ | letter | ascii2) (ε| _ | - | : | . | digit | letter | signs | ascii2)*
2) The regular expression of the element text:
element_text -> (ε| not reserve)*
3) The regular expression of the attribute name:
proper_name -> (_ | letter | ascii2) (ε| _ | - | : | . | digit | letter | signs | ascii2)*
4) Regular expression of attribute text:
proper_value -> (ε| not reserve)*
XML syntax structure:
xml_document -> xml_header (ε| xml_declare | xml_instruct | xml_comments)* xml_element xml_header -> [<?xml](space)*(proper_token)*(space)* [?>] xml_declare -> [<!]reserve_word(space)*(token)*(space)*[>] xml_instruct -> [<?]reserve_word(space)* (proper_token)* (space)*[?>] xml_comments -> [<!--](ε| digit | letter | signs | ascii2 | space)*[-- >] xml_element -> [<]element_name (space)*( ε| proper_token)*(space)*[/>] | [<]element_name(space)*( ε | proper_token)*(space)*[>] [ε| <![CDATA[ ]element_text[ε| ]]>] (ε | xml_element)*(space)*[</]element_name[>] proper_token -> proper_name(space)*[=](space)* [ε| <![CDATA[ ] [‘ | “]proper_value[‘ | “] [ε| ]]>] reserve_word -> [DOCTYPE | ELEMENT | NOTATION | …] token -> (ε| not reserve)*
Analyzing XML grammar requires constructing a pushdown automaton, its structure The definition is as follows:
1) STACK_DFA mata_xml_doc =
Q: {…} /*详见后面的状态集合*/ Σ: /*指向待解析的XML元素词串*/ σ: Q×Σ->Q /*状态转移函数,见状态转移列表*/ q: {NIL_SKIP} /*初始状态*/ Γ: {NIL_FAILED,NIL_SUCCEED} /*终结状态集合*/ S: {Q/*状态*/, N/*DOM节点*/>,<…>} /*下推栈*/
2) The stack top symbol set is used to reflect Type of current analysis node:
T:{NIL/*空*/, TG/*标记*/, NS/*元素*/, IS/*指令*/, DS/*声明*/, CD/*CDATA界段*/,CM/*注释*/}
3) The status set reflects the characteristics of a certain stage of analysis, corresponding to the top symbol of the stack:
NIL: NIL_FAILED /*失败*/ NIL_SKIP /*忽略*/ NIL_SUCCEED /*成功*/ CM: CM_BEGIN /*注释开始*/ CM_END /*注释结束*/ TG: TG_OPEN /*标记打开*/ TG_INT_CLOSE /*标记中断*/ TG_PRE_CLOSE /*标记准备关闭*/ TG_CLOSE /*标记关闭*/ NS: NS_NAME_BEGIN /*元素名开始*/ NS_NAME_END /*元素名结束*/ NS_KEY_BEGIN /*属性名开始*/ NS_KEY_END /*属性名结束*/ NS_ASIGN /*属性赋值*/ NS_VAL_BEGIN /*属性值开始*/ NS_VAL_END /*属性值结束*/ NS_TEXT_BEGIN /*元素文本开始*/ NS_TEXT_END /*元素文本结束*/ IS: IS_OPEN /*指令打开*/ IS_NAME_BEGIN /*指令名开始*/ IS_NAME_END /*指令名结束*/ IS_KEY_BEGIN /*指令键开始*/ IS_KEY_END /*指令键结束*/ IS_ASIGN /*赋值符*/ IS_VAL_BEGIN /*指令值开始*/ IS_VAL_END /*指令值结束*/ IS_CLOSE /*指令关闭*/ DS: DS_OPEN /*声明打开*/ DS_SKIP /*越过申明节*/ DS_CLOSE /*声明关闭*/ CD: CD_BEGIN /*CDATA界段开始*/ CD_END /*CDATA界段结束*/
The above is the detailed content of Web Programming-Detailed Explanation of XML Grammar Analysis. For more information, please follow other related articles on the PHP Chinese website!

This article explains how to use RSS feeds for efficient news aggregation and content curation. It details subscribing to feeds, using RSS readers (like Feedly and Inoreader), organizing feeds, and leveraging features for targeted content. The bene

This article explores integrating XML and Semantic Web technologies. The core issue is mapping XML's structured data to RDF triples for semantic interoperability. Best practices involve ontology definition, strategic mapping approaches, careful att

This article explains Atom Publishing Protocol (AtomPub) for web content management. It details using HTTP methods (GET, POST, PUT, DELETE) with Atom format for content creation, retrieval, updating, and deletion. The article also discusses AtomPub

This article details implementing content syndication using RSS feeds. It covers creating RSS feeds, identifying target websites, submitting feeds, and monitoring effectiveness. Challenges like limited control and rich media support are also discus

This article details using XML for data interoperability, focusing on healthcare and finance. It covers schema definition, XML document creation, data transformation, parsing, and exchange mechanisms. Key XML standards (HL7, DICOM, FinML, ISO 20022)

This article details securing RSS feeds against unauthorized access. It examines various methods including HTTP authentication, API keys with rate limiting, HTTPS, and content obfuscation (discouraged). Best practices involve IP restriction, revers

This article details creating custom XML vocabularies (schemas) for data consistency. It covers defining scope, identifying entities & attributes, designing XML structure, choosing a schema language (XSD or Relax NG), schema development, testing

This article explains how optimizing RSS feeds indirectly improves website SEO. It focuses on enhancing feed content (descriptions, keywords, metadata), structure (XML, formatting, encoding), and distribution to boost user engagement, content discov


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

SublimeText3 Linux new version
SublimeText3 Linux latest version
