Home >Backend Development >XML/RSS Tutorial >Semantics of XML tags

Semantics of XML tags

黄舟
黄舟Original
2017-02-25 14:11:072343browse

[Abstract] Although the XML document type definition provides a mechanism that can describe the syntax of the XML language in a machine-readable form, there is currently no similar mechanism to specify the specific semantics of the XML vocabulary. This means that there is no way to explain the meaning of XML tags, and the facts and relationships represented by XML cannot be clearly, comprehensively and normatively defined. This has serious practical and theoretical consequences. On the positive side, XML structures can be given arbitrary semantics and used in areas unforeseen by their original designers. On the less positive side, content developers and software engineers must rely on bland documentation or, worse still, have to rely on guessing the intent of the markup language designer. This process is time-consuming, labor-intensive, error-prone, and cannot be verified. Even if the designer's original documentation work is done perfectly, unsatisfactory situations will still occur. In addition, the lack of research on the semantic nature of markup also means that digital document processing, which belongs to the field of engineering applications, has no theory at all. Although some ongoing projects (XML Schema, RDF, Semantic Web) have achieved some results, none of these projects directly and comprehensively solve the core issues of XML markup semantics. This article reviews the development history of the concept of markup meaning, clarifies the motivation for interpreting the formal semantics of XML, and introduces a scientific research project on semantics - the BECHAMEL Markup Semantics Project.
[Keywords]SGML Text markup systems such as Standard Generalized Markup Language (SGML) and Extensible Markup Language (XML) have begun to be applied in all aspects of society, business, culture, and life. SGML/XML is a machine-readable technology that defines a descriptive markup language. Except for some parts that require special treatment, this language clearly defines the structure of the document and its underlying meaning. SGML/XML is developing rapidly, and widespread use of this technology can support high-performance document interoperability processing and publishing.
This beautiful wish has been partially realized. The superiority of SGML/XML has exceeded people's expectations. However, the functionality, interoperability, diversity and accessibility of the SGML/XML document system still need to be improved. If this opportunity is not seized, the consequences will be very serious: the industry has spent high financial costs and lost many opportunities; it may also lead to some disasters in critical safety applications; for people with disabilities, this will hinder They have equal access to the cultural and commercial benefits of contemporary society. In addition, long-standing problems continue to remind us that the best current digital document models are still flawed, or at least incomplete.
The root of these problems is that although SGML/XML can provide a meaningful structure for the document, SGML/XML cannot represent the basic semantic relationships between document components and topics in a systematic, machine-processable way. SGML/XML supports the description of machine-readable "grammar", but it does not provide a mechanism to explain the semantic connotation of a certain grammar. Therefore, there is no way to formally express the potential meaning of an SGML/XML vocabulary. Current SGML/XML cannot even express very simple basic semantic facts about document annotation systems. These facts are usually pre-designed by markup language designers, but the specific implementation still depends on markup language users and software.
This lack of expressive function forces SGML/XML users to guess the semantic relationships that markup language designers thought of but did not formally express. Content developers must guess at the designer's intent and work on those inferences when encoding content, without being able to clearly express their inferences and intentions to others or to applications that process the encoded content. Software designers also need to guess the possible intentions of markup language designers and design this guess into software tools and application systems. Sometimes second-order guesswork is necessary: ​​the software designer guesses the content developer's inference of the markup language designer's intent.
Obviously, these speculations are incomplete, fallible and unproven. Moreover, the production and implementation processes are time-consuming and labor-intensive, and the functionality and interoperability are also poor. Equipping a general natural language document with an SGML/XML specification does not perfectly solve this problem. Of course, ordinary natural language documents can provide some hints to content providers and software engineers, but there are currently no general rules for SGML/XML documents. In any case, ordinary natural language documents are not in machine-readable form, and this is the problem we are talking about with the SGML/XML markup system.
The idea of ​​machine-processable semantic description related to SGML and XML has not yet been formed. This is the source of current problems in the engineering field and obstacles to future development. There are also few related semantic studies, but many scholars have begun to pay attention to this issue. . The work on W3CSchema is related to this, but only covers a small part of this problem (such as data types). The W3C's "Semantic Web" project is also related to this, but it is for the development of general XML-based knowledge representation technology. Our research focuses on the semantics of document markup, which is hidden in actual document processing systems. People may say that the essence of the Semantic Web is to design semantic tags. However, in this article, we believe that to solve the above problems, we must also consider the essential meaning of tags in depth.
Next, this article first explains the meaning of markup from the historical background (markers played an interesting role in the development of text processing methods); secondly, it describes in detail what factors create the need for formal semantic markup and what factors determine the semantic requirements; finally, a brief introduction is given to a research project that multiple institutions are participating in the implementation of - the BECHAMEL Markup Semantics Project, which is working hard to solve the semantic problem of marks.
2 Historical Background
Document "marks" can probably be counted as part of the communication system, including early writing, copying, publishing and printing. However, with the development of digital text processing and typesetting, the use of marks has become conscious and It is common and has become an important area of ​​innovation in system development. The period from the 1960s to the 1980s was a period of comprehensive and systematic development of document markup systems, with the focus being on improving the effectiveness and functionality of digital typesetting and text processing. In the early 1980s, people were still working on a theoretical framework for marking and using it to support the development of high-performance systems. Some results in this area have been published, but most of them are only recorded in working documents and products in various standard forms.
A view that emerged at this stage is that the document, as an intellectual achievement, is more suitable to be abstracted into an ordered hierarchical structure model of a series of objects (such as chapters, paragraphs, formulas, etc.) rather than one-dimensional text. Character flow model. The character stream is often mixed with a large number of encodings that define the format, structures describing the design layout (such as page numbers, columns, printing lines), matrices of pixel values, and other potential expressions in different document processing and storage systems. The ordered hierarchical structure model summarizes two essentially different annotations, namely annotations that identify editing text objects (titles, chapters, etc.) and annotations that describe layout requirements. The application of the former has achieved some results. Relevant document elements such as titles, chapters, paragraphs, equations, citations, etc. can be clearly marked by delimiter tags, and the elements can then be processed indirectly through rules mapped to the element type. This separation of content and form enables base-level indirection and abstraction in a common combinatorial economy. This form of separation has enormous and varied practical value in all aspects of document processing, and more importantly it seems to illuminate the question of what exactly a document is. The descriptive markup used to do this not only marks the scope of the element, but also carries the meaning that the document model wants to reveal (for example, this text is a chapter).
In the early 1980s, the American National Institute of Standardization (ANSI/ISO) released the influential SGML document markup metagrammar and sorted out previous theoretical and analytical work on markup and document structure. SGML provides a machine-readable form for defining a descriptive markup language. As a meta-grammar, SGML does not define a markup language, but details techniques for developing machine-readable markup languages. The core of this definition is a formal expression mechanism similar to the Backus-Naur Form (BNF). This mechanism carries rules for defining typed properties and their values, as well as other designs for further abstraction and indirection (see the comments on Document Type Definitions (DTDs) and Backus-Noel A summary of the degree of paradigm similarity). Structurally, an SGML document is a tree with ordered branches and labeled nodes, which is the formal product of its corresponding DTD.
After years of analysis and practice, the basic ideas behind SGML have been well known. Taking advantage of industry-level standards at the meta-syntax level and localized innovation at the vocabulary level, SGML's unique mechanisms (backus-norr paradigm-like meta-syntax, typed attribute/attribute value pairs, entity references, etc.) are applied Programs and tools are implemented efficiently. The SGML markup language itself appears to be evolving while also supporting and optimizing ideal workflows for document system design, implementation, and utilization. From the mid-1980s to the early 1990s, a large number of SGML-based annotation systems were developed.
Although the development of SGML received a lot of attention, and the ideas were good and successfully implemented in multiple fields, for the first ten years almost no one used it. There are many factors leading to this result, but the most important thing is that SGML itself is too complex. In particular, SGML contains many complex optional attributes, and the corresponding software may not have to implement them at all, resulting in very slow development of SGML software. Worse, if the document is not validated with a DTD, further analysis is impossible. Abbreviation control means that element boundaries cannot be determined without regard to document syntax. In addition, SGML also contains some other attributes, which will cause existing syntax analysis tools to be inapplicable to formal grammar and unable to perform efficient syntax analysis.
In terms of online publishing and communication, the SGML system can be applied to HTML (Hypertext Markup Language). The original version of HTML was loosely defined and lacked formal syntax instructions. Later there was interest in HTML's SGMLDTD, and it proved difficult to design a DTD for something that had become the "correct" practice. More importantly, because in the original HTML specification, vendors arbitrarily added programmatic tags (such as

) to key descriptive tags (such as ), causing developers and users Also ignore the distinction between descriptive and procedural markup. The descriptive part of HTML doesn't even reflect the document's hierarchical structure very well, and the specification doesn't provide a stylesheet language to support indirection. Finally, the mechanism of SGML cannot extend the element set and use the replacement element set. It seems that the HTML document cannot be processed by the general SGML processor (which allows the extension and replacement of DTDs), but can only be processed by a specific HTML formatter, which cooperates with the processor. Hard-coded formatting rules handle HTML tags. <br>The subsequent development of HTML can be seen as the process of transforming the original loose HTML language into the SGML language sequence. This transformation is achievable given sufficient time and resources to apply those proven document system design rules. However, the newly established W3C organization faces great pressure to adopt new element collections and apply SGML on the Web. The shortcomings of SGML make it difficult to take advantage of SGML and descriptive markup on the Web. The main problem is that there are a large number of multi-select features, complex formal grammar, and the need to rely on DTD to determine elements in SGML. <br>To ensure that HTML and other related technologies can take full advantage of metasyntax, users can more easily develop and share new domain-specific elements, and documents can be parsed into element trees without DTD indexing, SGML tools and Applications can evolve harmoniously, and the W3C created a subset of SGML that hopes to provide a relatively simple standard (without the need for selection), some relatively simple syntax, and a way to handle unvalidated document formats without a DTD. So XML came into being. After one and a half years of development, XML was officially launched in 1998 as a recommended standard by W3C. <br>Since 1998, the novel XML markup language has experienced explosive growth, and this rapid development momentum continues to this day. The reasons for this explosive development are: <br> (1) The need for new annotation systems in specific fields. As networked electronic publishing applications grow in science, medicine, business, law, engineering, and specific areas of these large disciplines, new annotation systems need to be developed. <br>(2) Reduce the cost and complexity of developing new tools and their applications. Parsing XML is simpler compared to SGML. <br> (3) XML tags support information processing and dissemination processes related to publishing, as well as applications unrelated to publishing. <br>Thankfully, we have finally developed effective and easy-to-implement technology to create high-performance markup languages, digital documents, and document processing and publishing systems that integrate with other information management programs. In particular, it should be pointed out that the need to deeply process the underlying intention in the document structure promotes the emergence of new system functions, and also puts forward the need for automatic processing of information, at least new needs that do not require a lot of manual intervention. <br> 3 Question <br>Unfortunately, some existing experiences and feedback have made us soberly aware that our understanding of the conveying meaning of descriptive tags and current technology simply cannot meet our expectations. <br>In the 1980s, the systematization and systematic work of document marking mainly focused on three aspects. <br>(1) Conceptualization of universal document model. <br> (2) Development of formal specifications, vocabulary and grammar-related technologies related to document markup languages. This document markup language can define specific document classes and instantiate and present the model. <br> (3) Development of markup languages ​​(such as CALS, AAP, TEI, HTML, etc.). <br>Using descriptive markup languages ​​to identify and annotate logical parts of a document can clearly convey "meaning" that previously existed only in latent form. At least the meaning of procedural markers can be clear, unambiguous, and suitable for machine processing. <br>Many people refer to XML documents as "self-describing data." Although there were some dissenting voices early on (see Mamrak and, most importantly, Raymond and Tompa's views), during the earliest stages of the development of descriptive markup, enthusiasm among document researchers faded away, and it seemed that most felt no need to explore further. Laborious document representation. The clearly defined SGML markup language expresses the underlying meaning of the document structure so that it can be fully and effectively used for machine processing. One of the authors of this article once participated in writing this sentence, "Finally, it should be clear that for competing markup systems, descriptive markup is not only the best method, but also the best method one can think of. ". <br>The experience of the 1990s shows that this confidence is somewhat blind. From a practical perspective, the situation is much improved today, but repeated failures in interoperability and functionality indicate that SGML/XML has not really succeeded in providing documents with underlying meaning and a computer-processable form. In SGML/XMLDTD, the precision of elements and attributes does not match the precision of other similar document type definitions, parts of the content are not formal, and there is no single definite answer where inferences need to be made. But qualitatively speaking, people's understanding of documents is different from before the emergence of SGML. At that time, people's understanding of the meaning of document structure came from reflection on relatively obscure clues. <br>The essential attributes of DTD explain the reasons for the above situation: DTD only displays a vocabulary and its corresponding grammar, and does not represent the semantic relationship between words. Whether the "title" element in the general sense is represented by <title>, and whether <title> is similar to the concept of "title" we usually call, these cannot be determined by the DTD. The DTD can only indicate that there is a specific element whose label is the string "title". This label may be used with other elements, which are all defined in the same way. Therefore, content developers and software designers who use markup languages ​​to annotate documents need to simply infer the meaning of the <title> tag from the natural language associated with "title" in the text and how it is used in context. Perhaps the original language designers were unable to systematically and strictly define the meaning of <title>. <br>Of course, this exaggerates the actual situation. In a sense, the meaning of each mark can be basically expressed clearly in the purely natural language document provided by the markup language developer. However, even the best-marked DTD documents in industrial and academic fields do not fundamentally solve the problem. <br>When designing a software that reflects the semantic relationships in a markup language, the language designer must be able to express clearly the relationship between the various parts of the document; then the software engineer must be able to (search, find, open) use this markup language document , and design applications to demonstrate its advantages. Both steps cannot be verified by machine, and their credibility cannot be guaranteed. If manual participation is required, it will hinder the development of high-performance network document processing and publishing systems. Therefore, we need a mechanism to ensure that markup language designers can specify semantic relationships in detail and formally, and that they can also be read and processed by applications and complete self-configuration without manual participation one by one. <br> Let’s look at some specific semantic relationships. These relationships have more or less potential practical value, but at present they cannot be easily and systematically exploited because there is no standard machine-processable representation. In fact, many relationships are so critical that software designers infer their presence in documents in specific ways and build specific systems to exploit them. <br>Class relationship. SGML/XML does not contain a general structure for expressing the hierarchical structure or class membership of classes in elements, characteristics, or characteristic values. Class is the most basic and practical module in the current mainstream structure of software engineering. We cannot say that a paragraph is a structural element (isa relationship), or that all structural elements are editable elements (ako relationship). Two basic SGML/XML designs can sometimes implement basic classification by attribute/value (specifically using the "type" and "class" attributes). This classification technology is not yet mature enough, and SGML and XML fail to provide better mechanisms to control and restrict its use. In practical applications, many document type designers adopt the hierarchical structure of classes for design. XML Schema provides a clear declaration of class relationships, but it does not semantically explain the differences between these complex types and other complex types. <br>inheritance relationship. In many markup languages ​​(such as TEI and HTML4.0), certain attributes are inherited by the containing element, and in some cases the included text content also inherits these attributes. For example, if an element's attribute/value notation is "lang="de"", which indicates that the text is in German, that means that all of its child element attributes are in German. But the DTD does not provide formal instructions for specifying which characteristics can be inherited. Moreover, such inheritance relationships are not fixed and may sometimes change due to secondary definitions of included elements. There are many ways to inherit, some involve attributes of elements, some involve attributes of attributes, and others involve text and content of elements. For example, if a tag indicates that a sentence is German, this means that all words in the sentence (barring special circumstances) are German. Similarly, all words and phrases marked with the deletion attribute will be deleted, and those marked with the key attribute will be emphasized. Marking a part of the content as a paragraph means that all words (or elements) in this part of the content belong to this paragraph. . It is not possible to specify which properties a DTD inherits, nor its inheritance logic (including rule errors). Software designers often reason about these relationships in a specific markup language and then implement them in the tools and applications they develop. <br>Contextual relationship and reference relationship. In many markup languages, even if an element has a fixed meaning and is used to mark the same element type, the element may mean different things depending on the context. For example, some text is marked as "<title>", and its specific meaning depends on the structural position of the text. The "<title>" under "" refers to the title of the object "<document>", while the "<title>" under "<chapter>" refers to the title of this chapter part . There are no criteria for determining what kind of title it is. The situation is more complicated when a reference contains a "<title>" element, where the title is an entity external to the article. Relationships like this cannot be represented by a DTD, but can be inferred by software designers, which is necessary for efficient automated processing of text (if each meaning is represented by a different universal identifier, only a small part of the problem can be solved Such a problem. Because it is still necessary to clarify the binary characteristics of the attribute and provide a parsable expression to locate the object to which the attribute applies). <br> refers to the essential change. A similar but more ambiguous situation exists where the same object has multiple attributes, each of which refers to the same referent in the same format, but which must be interpreted carefully to ensure the unambiguity of its referent. For example, a particular element instance has the following three properties: it is a theorem, it is written in German, and it is illegible. Do such simple and direct predicate descriptions represent the same thing (or element instance)? Is this representation of knowledge robust enough? In fact, what it means is that these abstract sentences are written in German, the propositions they express are theorems, and their specific expression patterns are vague. Strictly speaking, no single object possesses all of these properties. <br>Complete synonyms and partial synonyms. Complete or partial synonymy of a markup language is an extremely important semantic relationship, and the lack of a mechanism for describing this synonymous relationship creates serious heterogeneity problems. Using a single markup language may eliminate complete synonyms, but as the types of markup languages ​​increase, complete and partial synonyms are still difficult to express but important relationships between markup languages. Currently we do not have a suitable computer-processable formal method for documenting the synonyms of elements, attributes, and attribute values ​​in different markup languages. The constructive form (see below) can record most complete synonyms, but partial synonyms are difficult to record, and partial synonyms are more common in practical applications. The partial synonymity problem represented by class inclusion relations still goes a long way in solving the heterogeneity problem. <br> 4 BECHAMEL Project <br> The BECHAMEL Markup Semantics Project originated in the late 1990s and was completed by researchers from Sperberg-Mcqueen (W3C/MIT) and other institutions. They came from the Department of Cultural Affairs, Language and Information Technology Institution, Bergen University Research Foundation, Electronic Publishing Research Group, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. The name of the project is formed from the abbreviations of the cities where all the collaborators are located (Bergen, Norway; Champaign, Illinois; Espñola, New Mexico). <br>The research objectives of the BECHAMEL program are as follows. <br>(1) Define the representation and inference problems closely related to the semantics of document markup, and develop a taxonomy and description of the problems that all semantic-aware document processing systems must solve or face. <br> (2) Study the properties and semantic relationships of common markup languages, and evaluate the applicability of standardized knowledge representation technologies (such as semantic networks, frameworks, logic, formal grammar and production rules). In order to model these relationships and attributes, their adequacy, elegance, simplicity, and computational efficiency in knowledge representation must also be considered. <br>(3) Develop and test formal, machine-readable representation frameworks that can represent the semantics of markup languages. <br>(4) Explore the application forms of semantic representation technology, such as supporting transcoding, information retrieval, accessibility enhancement, etc. Our current focus is on supporting semantic reasoning for document database instances, as we believe this is the best focus for applying knowledge representation technology. <br>(5) Cooperate with the Digital Library Content Coding Project in the field of humanities computing research, and join forces with software tool developers to conduct large-scale testing of semantic representation solutions. <br>The early Prolog experimental bench has been fully developed into a knowledge representation prototype platform for representing facts and reasoning rules in structured documents. The system allows analysts to specify certain facts (such as universal identifiers and attribute values) and separate them from inferential facts about semantic entities and attributes. <br>The system also provides an abstraction layer that enables the meaning of tags to be clearly expressed in a machine-readable and executable form. On this basis, inferences can be made based on document components, including those with ambiguous structures such as hierarchically overlapping components. We have developed a set of predicates that mimic the methods used in the W3C's Document Object Model for node hierarchy navigation and can retrieve various property values ​​and related information in the document type definition. This allows for a clear distinction between the grammatical information analyzed by the parser and the document semantics expressed by the analyst. <br>Preliminary research results show the complexity of semantic reasoning recognition and the complexity of contextual uncertainty understanding. This prototype reasoning system proves that automatic reasoning about tags is feasible, and that Prolog's rules can handle complex situations such as non-monotonicity and situation ambiguity. Further research can refer to the citations. <br> 5 Semantic modeling of tags <br> The semantics of document tags are abstract structures, attributes and relationships that can be understood by markup language users. Tags and their syntax imply this semantic clue. The semantics of tags can use knowledge representation technology to build corresponding computational models by clarifying structures, relationships, and attributes. <br><p>Refer to the following fragment of the XML markup document</p> <p><img src="https://img.php.cn//upload/image/197/946/771/1488002955228879.jpg" title="1488002955228879.jpg" alt="Semantics of XML tags"></p> <p>Readers familiar with the structure</p> <p>The tags in the document elements will naturally be known P stands for paragraph. The paragraph has a title. The paragraph content after the title element forms the body of the text. It starts after the title element and ends before the paragraph end tag. The meaning and usage of tags are not immediately obvious, so authors or readers can refer to the documentation for the tag collection</p> <p><img src="https://img.php.cn//upload/image/526/100/980/1488002987952563.jpg" title="1488002987952563.jpg" alt="Semantics of XML tags"></p> <p>Obvious tags are designed for the convenience of human readers. These tags cannot be extracted from the data structure with the help of a document parser. As shown in Figure 1, a parse tree (used by stylesheet programmers) shows the head, the citation, and the text before and after the citation, each of which is a separate child node of the paragraph, but the parse tree cannot show the following characteristics: the head It is an attribute of the entire paragraph, the text is two parts in the content structure, and the quotation is embedded inside the text. <br>In fact, the data structure itself has no distinction between paragraphs and quotations or anything related to them. A data structure is simply a graphical structure of related information, like a universal identifier with a "paragraph" value. The program should be able to infer the consistency between the meaning of the document and the tags used, and exploit this knowledge when the tree structure is converted from one form to another. However, this transformation (e.g., via XSLT, DSSSL, or a programming language like C++) relies on semantic reasoning rather than explicit encoding</p> <p><img src="https://img.php.cn//upload/image/767/427/454/1488003021428176.jpg" title="1488003021428176.jpg" alt="Semantics of XML tags"></p> <p>Figure 2 shows how to enrich and enhance the syntax tree by leveraging semantic knowledge. The use of knowledge representation technology can encode the relationship between the whole and parts at a higher level, which is more suitable for computer processing. This figure shows a traditional semantic network representation method. Of course, other methods are also under development, including framework representation, rule representation, formal grammar, and logic-based representation. The development of the Semantic Web Project (Part 8 of this article) may even provide suitable representation methods for markup languages ​​themselves. The crux of the matter is establishing a hierarchy of abstractions, relationships, and constraints that cannot be modeled and enforced by traditional XML/SGML parsers. <br>Encoding knowledge in machine-readable files (such as DTD or syntax structures) can be used to verify the semantic constraints of the document, providing a more powerful document model for applications. These more expressive representation methods provide strong support for the design and implementation of better document processing systems. <br> 6 Application <br> In recent years, the development of many new technologies has made conventional structured annotation more and more popular. These technologies mainly emphasize the following aspects in information management. <br>Conversions and unions. For SGML/XML developers, the most common job is to design transformation forms to convert from one application syntax to another. This is done to create new types of file representations or to facilitate their storage in a database. Sometimes developers need to integrate or adapt large collections of digital documents, each represented by a non-interoperable markup language. Regardless of the size of the conversion, the conventional solution is to use a conversion programming language that acts directly on the parse tree. The tree structure generated in the source file analysis is converted into a tree structure instance in the target language. The converted tree is serialized into new document instances, graphics, or audio. <br>Information island. This problem is very similar to the above-mentioned conversion problem, but the goal is not to convert one form of document into another form of document, but to allow distributed storage of documents or document fragments to provide a common transparent access interface to system users. Although it is not necessary to convert documents verbatim from one markup language to another, the system must be able to ensure that the content of the document appears to blend seamlessly, even though the encoding of the document may vary widely. <br>Availability. Authoring tools are increasingly embracing structured markup, which has become a boon for visually impaired users to access digital documents. Declarative markup enables people to read with the help of a screen reader or braille display and make inferences with the help of mnemonics rather than drawing on graphical clues. However, such applications currently need to rely on the user's own capabilities or interface software, and structural inferences based on independent tag content or grammar. As described in the tag set documentation, tag syntax constraints and the meaning and use of tags are strictly dependent on the credibility of the document author. Unfortunately, authors often misuse tags. The worst example is using "head" tags to mark certain layouts on web pages. <br>Safe handling. Part of the impetus for the development of more expressive markup schema languages ​​(such as the W3C's XML Schema language) is the realization that the consequences of markup errors, misuse, and abuse are far more serious than poorly formatted output. Declarative markup is used not only in e-commerce but also in secure information fields such as medical records and the aviation industry. Developers in these fields must not only ensure that the grammatical structure of digital documents is standardized, but also ensure that they comply with certain security protocols to ensure the safe processing, storage, transmission and presentation of documents. <br> 7 Advantages of markup semantics <br>The current survey results of the BECHAMEL project show that markup semantics can solve the above problems in the following ways. <br>Declarative, machine-readable semantic description. As far as the current actual situation is concerned, structured markup language designers use natural language text to express the meaning of tags and clarify their appropriate use. The formal markup semantic system enables the relationships between ontologies to be clearly expressed by computer programs and enables automated processing. <br>Verification of hypothesis. In a document environment without a formal set of tags, a system with the ability to interpret tag semantics provides an environment for testing guesses and validating hypotheses. In this environment, an undisclosed user of a markup language will speculate on the properties and rules that he believes are consistently applied in the document database. The document processing software then retrieves those document elements that are or are not compatible with the assumed rules. <br>Enhancement of semantic constraints. A parser that supports validity verification can not only complete syntax verification like a conventional semantic parser, but also verify the guess while discovering or writing semantics. Such a parser can also enforce semantic constraints. This operation is consistent with hypothesis verification, but in this case the semantic constraints are known and canonical. <br>Optimized and more expressive APIs. Markup semantics are used when converting or representing digital documents using SGML and XML applications. But higher-level properties and associations are revealed only when the program is executed. Formal, machine-readable semantics will enrich application interfaces and speed up software design. With the development and changes of markup languages, these software will be more convenient and safer to maintain. <br> 8 Related work <br>In response to the above challenges and problems, there are many other document processing technologies, standards and research plans. Next we review existing ideas that attempt to address these issues. <br>Semantic Web. The Semantic Web refers to a number of interconnected research and standardization efforts, like some of the current ideas around markup and knowledge representation technologies. The core one is the W3C resource description framework, which of course also includes other technologies, such as ISO's theme map technology. The Semantic Web has a wide scope and ambitious goals, aiming to use universal knowledge representation technology to improve markup languages, thereby "promoting the comprehensive development of human knowledge." The research and standardization of the Semantic Web is different from the current thinking: instead of semantic description of a specific field, it aims to achieve semantic annotation of knowledge in all fields. The current research goal is specifically focused on "document markup semantics" rather than "general semantic markup". Advances in Semantic Web technology will make it possible for us to use Semantic Web markup languages ​​to encode the semantics of tags. <br>W3C’s Document Object Model. The Document Object Model is an application programming interface that is a hierarchical data structure generated after analyzing XML documents. People want to design a system that can provide various interfaces for markup semantics, similar to the markup syntax-related forms provided by DOM, and ultimately form a "semantic DOM" to supplement the W3C's syntax DOM. <br>W3C Schema. XML Schema is an XML-based language that can replace traditional DTDs and be used to constrain XML documents. The development of this language was driven by the limitations of DTDs, which are similar to the problems we faced in the BECHAMEL project. Schema allows document designers to define complex data types, just like in high-level programming languages. However, in order to encode all the relationships and constraints in tag set documentation, we also need a more powerful expression form than the current XML Schema. The architectural form of Hypermedia/Time based Structuring Language (HyTime). Adaptable architectural techniques come from the recognition that different markup language applications are often encoded with structures that vary in style but are semantically equivalent. Schema forms allow document class designers to map their own specific element instances to more general schema instances that are easier to map between different applications. These mappings indeed represent constrained forms of semantic knowledge and are helpful in solving the above transformation and integration challenges. The BECHAMEL project is, in part, about building a model that expresses more semantic relationships than architectural forms. <br></p> <p> The above is the content of the semantics of XML tags. For more related content, please pay attention to the PHP Chinese website (www.php.cn)! <br></p> <p><br></p> <p><br></p>
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:XML parsingNext article:XML parsing