Package io.sf.carte.doc.dom


package io.sf.carte.doc.dom
This package provide an implementation of the Document Object Model (DOM) Level 3 Core Specification that can be used for XML or HTML documents, albeit with a few deviations from the specification.

The following behavior is believed to be more user-friendly from the point of view of a developer that is handling an HTML document, but is non-conformant:

  1. On elements and attributes, Node.getLocalName() returns the tag name instead of null, when the node was created with a DOM Level 1 method such as Document.createElement(). In HTML documents, all the elements have implicitly the HTML namespace unless they have a different one.
  2. As all the HTML elements have an implicit namespace and the idea is to handle HTML and XHTML in the same way, DOMElement.getTagName() does not return an upper-cased name.
  3. Entity references are allowed as a last-resort solution in case that an entity is unknown. No known current parser uses that, though. This limited support for entity references may be dropped in future versions.
  4. The class list obtained by getClassList() is not read-only: changes to it are reflected in the attribute, and vice-versa.
  5. Calling normalize() on a STYLE element sets its text content to the contents of the associated style sheet.
  6. The order of the element attributes is as specified, while other implementations like the one shipped with most JDKs (Xerces-j) do not enforce any particular order.
  7. By default, not-specified attributes are not set, omitting the default value if any.

Traversing the DOM

There are several alternative procedures to retrieve the child nodes of a parent node. The most straightforward is also the fastest: get the first (or last) child, and then iterate through the next (or previous) siblings:

 DOMNode node = getFirstChild();
 while (node != null) {
        someNodeProcessing(node); // do something with that node
        node = node.getNextSibling();
 }
 

or, if you are used to for loops:

 for (DOMNode node = getFirstChild(); node != null; node = node.getNextSibling()) {
        someNodeProcessing(node); // do something with that node
 }
 

The iterators are also fast:

 Iterator<DOMNode> it = parentNode.iterator();
 while (it.hasNext()) {
        DOMNode node = it.next();
        someNodeProcessing(node); // do something with that node
 }
 

There are several different iterators, like the elementIterator:

 Iterator<DOMElement> it = parentNode.elementIterator();
 while (it.hasNext()) {
        DOMElement element = it.next();
        someElementProcessing(element); // do something with that element
 }
 

or the typeIterator:

 Iterator<Node> it = parentNode.typeIterator(Node.PROCESSING_INSTRUCTION_NODE);
 while (it.hasNext()) {
        ProcessingInstruction pi = (ProcessingInstruction) it.next();
        someProcessing(pi); // do something with that processing instruction
 }
 

Finally, the old NodeList interface, which in this library is implemented in the more modern flavours of DOMNodeList and ElementList:

 NodeList list = parentNode.getChildNodes();
 for (int i = 0; i < list.getLength(); i++) {
        Node node = list.item(i);
        someNodeProcessing(node); // do something with that node
 }
 

which is a less efficient way to examine the child nodes, but still useful. Using it as an Iterable is more efficient (be sure to use DOMNodeList or ElementList):

 DOMNodeList list = parentNode.getChildNodes();
 for (DOMNode node : list) {
        someNodeProcessing(node);
 }
 
or:
 ElementList list = parentNode.getChildren();
 for (DOMElement element : list) {
        someElementProcessing(element);
        // Let's do something with the attributes
        AttributeNamedNodeMap attributes = element.getAttributes();
        for (Attr attr : attributes) {
                attr.setValue("foo");
        }
 }
 

To iterate across the document (as opposed to just the child nodes), there is the createNodeIterator(Node, int, NodeFilter) method. For example:

 NodeIterator it = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, null);
 while (it.hasNext()) {
        DOMElement element = (DOMElement) it.next();
        someElementProcessing(element); // do something with that element
 }
 

This library's version of NodeIterator implements ListIterator.

And finally the TreeWalker, which can be created with the createTreeWalker(Node, int, NodeFilter) method:

 TreeWalker tw = document.createTreeWalker(document, NodeFilter.SHOW_ELEMENT, null);
 DOMNode node;
 while ((node = tw.nextNode()) != null) {
        someNodeProcessing(node);
 }
 

Serializing the DOM

The class DOMWriter can be used to pretty-print a document or a subtree. To do that, it takes into account the default values of the display CSS property for the elements, according to the user agent's default style sheet. Also allows to replace a specified subset of codepoints with the proper entity references, when serializing a Text node.