Home > Article > Backend Development > XML Matters: Beyond the DOM (Tips and Tricks for Using the DOM with Ease)
The Document Object Model (DOM) is one of the most commonly used tools for manipulating xml and HTML data, but its potential is rarely fully exploited. By taking advantage of the DOM and making it easier to use, you get a powerful tool for XML applications, including dynamic Web applications.
This article introduces a guest columnist, friend and colleague Dethe Elza. Dethe has extensive experience in developing Web applications using XML, and I would like to thank him for his help in introducing me to XML programming using DOM and ECMAScript. Stay tuned to this column for more from Dethe.
—— David Mertz
DOM is one of the standard APIs for processing XML and HTML. It's often criticized for being memory-intensive, slow, and verbose. Still, it's the best choice for many applications, and it's certainly much simpler than XML's other major API, SAX. DOM is gradually appearing in tools such as web browsers, SVG browsers, OpenOffice, and others.
DOM is great because it is a standard and is widely implemented and built into other standards. As a standard, its handling of data is programming language agnostic (which may or may not be a strength, but at least it makes the way we handle data consistent). The DOM is now not only built into web browsers, but is also part of many XML-based specifications. Now that it's part of your arsenal, and maybe you still use it occasionally, I think it's time to take advantage of what it gives us.
After working with the DOM for a while, you'll see patterns develop - things you want to do over and over again. Shortcuts help you work with lengthy DOMs and create self-explanatory, elegant code. Here's a collection of tips and tricks that I use frequently, along with some javaScript examples.
insertAfter and PRependChild
The first trick is "there is no trick". The DOM has two methods for adding child nodes to a container node (usually an Element, but also a Document or Document Fragment): appendChild(node) and insertBefore(node, referenceNode). It seems like something is missing. What if I want to insert or prepend a child node after a reference node (making the new node the first in the list)? For many years, my solution was to write the following function:
Listing 1. Wrong method for inserting and adding by previous
function insertAfter(parent, node, referenceNode) { if(referenceNode.nextSibling) { parent.insertBefore(node, referenceNode.nextSibling); } else { parent.appendChild(node); } } function prependChild(parent, node) { if (parent.firstChild) { parent.insertBefore(node, parent.firstChild); } else { parent.appendChild(node); } }
Actually, something like Listing 1 Likewise, the insertBefore() function has been defined to return to appendChild() when the reference node is empty. Therefore, you can instead use the methods above and use the ones in Listing 2, or skip them and just use the built-in functions:
Listing 2. Correct way to insert and add by
function insertAfter(parent, node, referenceNode) { parent.insertBefore(node, referenceNode.nextSibling); } function prependChild(parent, node) { parent.insertBefore(node, parent.firstChild); }
If you are new to DOM programming, it is necessary to point out that although you can have multiple pointers pointing to a node, the node can only exist in one location in the DOM tree. So if you want to insert it into the tree, there is no need to remove it from the tree first as it will be removed automatically. This mechanism is convenient when reordering nodes by simply inserting them into their new positions.
According to this mechanism, if you want to exchange the positions of two adjacent nodes (called node1 and node2), you can use one of the following solutions:
node1.parentNode.insertBefore (node2, node1);
or
node1.parentNode.insertBefore(node1.nextSibling, node1);
What else can you do with DOM?
DOM is widely used in Web pages. If you visit the bookmarklets site (see Related topics), you'll find many creative short scripts that can rearrange pages, extract links, hide images or Flash ads, and more.
However, because Internet Explorer does not define Node interface constants (which can be used to identify node types), you must ensure that if you miss an interface constant, you first define the interface constant in the DOM script for the web.
Listing 3. Make sure the node is defined
if (!window['Node']) { window.Node = new Object(); Node.ELEMENT_NODE = 1; Node.ATTRIBUTE_NODE = 2; Node.TEXT_NODE = 3; Node.CDATA_SECTION_NODE = 4; Node.ENTITY_REFERENCE_NODE = 5; Node.ENTITY_NODE = 6; Node.PROCESSING_INSTRUCTION_NODE = 7; Node.COMMENT_NODE = 8; Node.DOCUMENT_NODE = 9; Node.DOCUMENT_TYPE_NODE = 10; Node.DOCUMENT_FRAGMENT_NODE = 11; Node.NOTATION_NODE = 12; }
Listing 4 shows how to extract all text nodes contained in the node:
List 4. Internal text
function innerText(node) { // is this a text or CDATA node? if (node.nodeType == 3 || node.nodeType == 4) { return node.data; } var i; var returnValue = []; for (i = 0; i < node.childNodes.length; i++) { returnValue.push(innerText(node.childNodes[i])); } return returnValue.join(''); }
Shortcut
People often complain that the DOM is too verbose, and simple functions require a lot of code. For example, if you wanted to create a dc6dce4a544fdca2df29d5ac0ea9906b element that contained text and responded to a button click, the code might look like:
Listing 5. The "long road" to creating a dc6dce4a544fdca2df29d5ac0ea9906b
function handle_button() { var parent = document.getElementById('myContainer'); var div = document.createElement('div'); div.className = 'myDivCSSClass'; div.id = 'myDivId'; div.style.position = 'absolute'; div.style.left = '300px'; div.style.top = '200px'; var text = "This is the first text of the rest of this code"; var textNode = document.createTextNode(text); div.appendChild(textNode); parent.appendChild(div); }
若频繁按照这种方式创建节点,键入所有这些代码会使您很快疲惫不堪。必须有更好的解决方案 —— 确实有这样的解决方案!下面这个实用工具可以帮助您创建元素、设置元素属性和风格,并添加文本子节点。除了 name 参数,其他参数都是可选的。
清单 6. 函数 elem() 快捷方式
function elem(name, attrs, style, text) { var e = document.createElement(name); if (attrs) { for (key in attrs) { if (key == 'class') { e.className = attrs[key]; } else if (key == 'id') { e.id = attrs[key]; } else { e.setAttribute(key, attrs[key]); } } } if (style) { for (key in style) { e.style[key] = style[key]; } } if (text) { e.appendChild(document.createTextNode(text)); } return e; }
使用该快捷方式,您能够以更加简洁的方法创建 清单 5 中的 dc6dce4a544fdca2df29d5ac0ea9906b 元素。注意,attrs 和 style 参数是使用 Javascript 文本对象而给出的。
清单 7. 创建 dc6dce4a544fdca2df29d5ac0ea9906b 的简便方法
function handle_button() { var parent = document.getElementById('myContainer'); parent.appendChild(elem('div', {class: 'myDivCSSClass', id: 'myDivId'} {position: 'absolute', left: '300px', top: '200px'}, 'This is the first text of the rest of this code')); }
在您想要快速创建大量复杂的 DHTML 对象时,这种实用工具可以节省您大量的时间。模式在这里就是指,如果您有一种需要频繁创建的特定的 DOM 结构,则使用实用工具来创建它们。这不但减少了您编写的代码量,而且也减少了重复的剪切、粘贴代码(错误的罪魁祸首),并且在阅读代码时思路更加清晰。
接下来是什么?
DOM 通常很难告诉您,按照文档的顺序,下一个节点是什么。下面有一些实用工具,可以帮助您在节点间前后移动:
清单 8. nextNode 和 prevNode
// return next node in document order function nextNode(node) { if (!node) return null; if (node.firstChild){ return node.firstChild; } else { return nextWide(node); } } // helper function for nextNode() function nextWide(node) { if (!node) return null; if (node.nextSibling) { return node.nextSibling; } else { return nextWide(node.parentNode); } } // return previous node in document order function prevNode(node) { if (!node) return null; if (node.previousSibling) { return previousDeep(node.previousSibling); } return node.parentNode; } // helper function for prevNode() function previousDeep(node) { if (!node) return null; while (node.childNodes.length) { node = node.lastChild; } return node; }
轻松使用 DOM
有时候,您可能想要遍历 DOM,在每个节点调用函数或从每个节点返回一个值。实际上,由于这些想法非常具有普遍性,所以 DOM Level 2 已经包含了一个称为 DOM Traversal and Range 的扩展(为迭代 DOM 所有节点定义了对象和 API),它用来为 DOM 中的所有节点应用函数和在 DOM 中选择一个范围。因为这些函数没有在 Internet Explorer 中定义(至少目前是这样),所以您可以使用 nextNode() 来做一些
类似的事情。
在这里,我们的想法是创建一些简单、普通的工具,然后以不同的方式组装它们来达到预期的效果。如果您很熟悉函数式编程,这看起来会很亲切。Beyond JS 库(参阅 参考资料)将此理念发扬光大。
清单 9. 函数式 DOM 实用工具
// return an Array of all nodes, starting at startNode and // continuing through the rest of the DOM tree function listNodes(startNode) { var list = new Array(); var node = startNode; while(node) { list.push(node); node = nextNode(node); } return list; } // The same as listNodes(), but works backwards from startNode. // Note that this is not the same as running listNodes() and // reversing the list. function listNodesReversed(startNode) { var list = new Array(); var node = startNode; while(node) { list.push(node); node = prevNode(node); } return list; } // apply func to each node in nodeList, return new list of results function map(list, func) { var result_list = new Array(); for (var i = 0; i < list.length; i++) { result_list.push(func(list[i])); } return result_list; } // apply test to each node, return a new list of nodes for which // test(node) returns true function filter(list, test) { var result_list = new Array(); for (var i = 0; i < list.length; i++) { if (test(list[i])) result_list.push(list[i]); } return result_list; }
清单 9 包含了 4 个基本工具。listNodes() 和 listNodesReversed() 函数可以扩展到一个可选的长度,这与 Array 的 slice() 方法效果类似,我把这个作为留给您的练习。另一个需要注意的是,map() 和 filter() 函数是完全通用的,用于处理任何 列表(不只是节点列表)。现在,我向您展示它们的几种组合方式。
清单 10. 使用函数式实用工具
// A list of all the element names in document order function isElement(node) { return node.nodeType == Node.ELEMENT_NODE; } function nodeName(node) { return node.nodeName; } var elementNames = map(filter(listNodes(document),isElement), nodeName); // All the text from the document (ignores CDATA) function isText(node) { return node.nodeType == Node.TEXT_NODE; } function nodeValue(node) { return node.nodeValue; } var allText = map(filter(listNodes(document), isText), nodeValue);
您可以使用这些实用工具来提取 ID、修改样式、找到某种节点并移除,等等。一旦 DOM Traversal and Range API 被广泛实现,您无需首先构建列表,就可以用它们修改 DOM 树。它们不但功能强大,并且工作方式也与我在上面所强调的方式类似。
DOM 的危险地带
注意,核心 DOM API 并不能使您将 XML 数据解析到 DOM,或者将 DOM 序列化为 XML。这些功能都定义在 DOM Level 3 的扩展部分“Load and Save”,但它们还没有被完全实现,因此现在不要考虑这些。每个平台(浏览器或其他专业 DOM 应用程序)有自己在 DOM 和 XML间转换的方法,但跨平台转换不在本文讨论范围之内。
DOM 并不是十分安全的工具 —— 特别是使用 DOM API 创建不能作为 XML 序列化的树时。绝对不要在同一个程序中混合使用 DOM1 非名称空间 API 和 DOM2 名称空间感知的 API(例如,createElement 和 createElementNS)。如果您使用名称空间,请尽量在根元素位置声明所有名称空间,并且不要覆盖名称空间前缀,否则情况会非常混乱。一般来说,只要按照惯例,就不会触发使您陷入麻烦的临界情况。
如果您一直使用 Internet Explorer 的 innerText 和 innerHTML 进行解析,那么您可以试试使用 elem() 函数。通过构建类似的一些实用工具,您会得到更多便利,并且继承了跨平台代码的优越性。将这两种方法混合使用是非常糟糕的。
某些 Unicode 字符并没有包含在 XML 中。DOM 的实现使您可以添加它们,但后果是无法序列化。这些字符包括大多数的控制字符和Unicode 代理对(surrogate pair)中的单个字符。只有您试图在文档中包含二进制数据时才会遇到这种情况,但这是另一种转向(gotcha)情况。
Conclusion
I have introduced many things that DOM can do, but DOM (and JavaScript) can do much more than that. Study and explore these examples to see how they can be used to solve problems that may require client scripts, templates, or specialized APIs.
DOM has its own limitations and shortcomings, but it also has many advantages: it is built into many applications; it works the same way whether using Java technology, Python or JavaScript; it is very easy to use SAX; using the template above, it is both simple and powerful to use. An increasing number of applications are beginning to support DOM, including Mozilla-based applications, OpenOffice, and Blast Radius' XMetaL. More and more specifications require and extend DOM (for example, SVG), so the DOM is always around you. You'd be wise to use this widely deployed tool.
The above is the content of XML Question: Beyond DOM (Tips and Tricks for Easily Using DOM). For more related content, please pay attention to the PHP Chinese website (www.php.cn)!