Parsing XML

Barbara Streisand
Barbara StreisandOriginal
2024-09-27 20:32:29629browse

Parsing XML

HTML is the most common markup language for web development. HTML is a superset of XML, which is to say it is an extension of the XML specification. What is cool about this fact is that web browsers, in their ability to render HTML, actually come with XML parsers, and have XML parsing capabilities under the hood.

Why Think About XML At All

HTML is the ubiquitous markup language of internet developers. The audience of this blog, software engineers, likely only has need for HTML. Yet, my Media Company deals with many authors of the non-technical variety, and I have got to say... Authors think about their content wayyy differently than HTML gives credit for.

The beauty of XML is its generic stucture which allows for custom parsing and handling. This flexibiliy has been beautifully exemplified in HTML, but the use case of allowing custom definitions is better handled by XML.

XML is a data-carrying language. HTML is an extension of that language that comes with standardized graphical-user interface rendering. To see what I mean by this, open an XML file in a browser. https://alexason.com/uploads/library.xml

As you will see, modern browsers render the file complete with element tags. But also take note that the browser recognizes the datatype, and applies special formatting. In this way, XML is more like JSON.


Parsing XML

While not native to browser rendering agents, it's possible to parse XML using the browser API's DOMParser.

See a gist of this is action
const xmlString = `
  <story>
    <styles>
      <titleStyle>
        <color>#4A90E2</color>
      </titleStyle>
      <paragraphStyle>
        <color>#333333</color>
      </paragraphStyle>
    </styles>
    <title>Elena and the Embrace of Holiness</title>
    <paragraph>In the heart of the village, where the sun kissed the earth...</paragraph>
    <!-- More paragraphs here -->
  </story>`;

const parser = new DOMParser();
const xmlDocument = parser.parseFromString(xmlString, "text/xml");
const parserError = xmlDoc.getElementsByTagName("parsererror");
if (parserError.length > 0) {
  // Handle error
  console.error("Error parsing XML:", parserError[0].textContent);
} else {
  // Successfully parsed the XML
  // XML Document contains a document
  console.log("Parsed XML Document:", xmlDocument);
  const title = xmlDocument.getElementsByTagName("title")[0].textContent;
  const titleColor = xmlDocument.getElementsByTagName("color")[0].textContent;
}


Real Use Case

The example shown demonstrates what is possible with XML, yet the use case of rendering and styling content is better handled by HTML. While the format, resembles HTML, using XML as HTML must not be the best case of XML.

My HTML Developer I know, Israel, writes XML like this. He uses the data format to recreate HTML, then uses JavaScript to make it HTML. While this is possible given the flexibility of XML, if the only use case is for the browser, I'll tell you what I tell Israel: "Just write HTML!"

Join Israel and the HTML Devs at Salvation.

Where to use XML

XML is a great format for intermediate representation. As mentioned, the immediate use case of my company is translating many different Author's (book authors, manuscript writers) representation of their work into a standardized format. The task is to turn Word documents, PDFs, plaintext, and spoken words into some similar data format.

XML could do that, and is exactly used as such in software programs such as Calibre and Manuskript.


This has been a look at XML. It is a widely-recognized format, compatible with many readers and conversion tools. Given it's ease of parsing, W3C recommendation, and ubiquity, XML is a safe language for indefinite data storage.

If you're interested in tools for data science and storage, be sure to Follow this Dev.to. Add a reaction ? for more content like this.

A

The above is the detailed content of Parsing XML. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn