io.sf.carte.doc.dom (CSS4J ecosystem API)

package io.sf.carte.doc.dom

This package provide an implementation of the Document Object Model (DOM) Level 3 Core Specification that can be used for XML or HTML documents, albeit with a few deviations from the specification.

The following behavior is believed to be more user-friendly from the point of view of a developer that is handling an HTML document, but is non-conformant:

On elements and attributes, Node.getLocalName() returns the tag name instead of null, when the node was created with a DOM Level 1 method such as Document.createElement(). In HTML documents, all the elements have implicitly the HTML namespace unless they have a different one.
As all the HTML elements have an implicit namespace and the idea is to handle HTML and XHTML in the same way, DOMElement.getTagName() does not return an upper-cased name.
Entity references are allowed as a last-resort solution in case that an entity is unknown. No known current parser uses that, though. This limited support for entity references may be dropped in future versions.
The class list obtained by getClassList() is not read-only: changes to it are reflected in the attribute, and vice-versa.
Calling normalize() on a STYLE element sets its text content to the contents of the associated style sheet.
The order of the element attributes is as specified, while other implementations like the one shipped with most JDKs (Xerces-j) do not enforce any particular order.
By default, not-specified attributes are not set, omitting the default value if any.

Traversing the DOM

There are several alternative procedures to retrieve the child nodes of a parent node. The most straightforward is also the fastest: get the first (or last) child, and then iterate through the next (or previous) siblings:

 DOMNode node = getFirstChild();
 while (node != null) {
        someNodeProcessing(node); // do something with that node
        node = node.getNextSibling();
 }

or, if you are used to for loops:

 for (DOMNode node = getFirstChild(); node != null; node = node.getNextSibling()) {
        someNodeProcessing(node); // do something with that node
 }

The iterators are also fast:

 Iterator<DOMNode> it = parentNode.iterator();
 while (it.hasNext()) {
        DOMNode node = it.next();
        someNodeProcessing(node); // do something with that node
 }

There are several different iterators, like the elementIterator:

 Iterator<DOMElement> it = parentNode.elementIterator();
 while (it.hasNext()) {
        DOMElement element = it.next();
        someElementProcessing(element); // do something with that element
 }

or the typeIterator:

 Iterator<Node> it = parentNode.typeIterator(Node.PROCESSING_INSTRUCTION_NODE);
 while (it.hasNext()) {
        ProcessingInstruction pi = (ProcessingInstruction) it.next();
        someProcessing(pi); // do something with that processing instruction
 }

Finally, the old NodeList interface, which in this library is implemented in the more modern flavours of DOMNodeList and ElementList:

 NodeList list = parentNode.getChildNodes();
 for (int i = 0; i < list.getLength(); i++) {
        Node node = list.item(i);
        someNodeProcessing(node); // do something with that node
 }

which is a less efficient way to examine the child nodes, but still useful. Using it as an Iterable is more efficient (be sure to use DOMNodeList or ElementList):

 DOMNodeList list = parentNode.getChildNodes();
 for (DOMNode node : list) {
        someNodeProcessing(node);
 }

or:

 ElementList list = parentNode.getChildren();
 for (DOMElement element : list) {
        someElementProcessing(element);
        // Let's do something with the attributes
        AttributeNamedNodeMap attributes = element.getAttributes();
        for (Attr attr : attributes) {
                attr.setValue("foo");
        }
 }

To iterate across the document (as opposed to just the child nodes), there is the createNodeIterator(Node, int, NodeFilter) method. For example:

 NodeIterator it = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, null);
 while (it.hasNext()) {
        DOMElement element = (DOMElement) it.next();
        someElementProcessing(element); // do something with that element
 }

This library's version of NodeIterator implements ListIterator.

And finally the TreeWalker, which can be created with the createTreeWalker(Node, int, NodeFilter) method:

 TreeWalker tw = document.createTreeWalker(document, NodeFilter.SHOW_ELEMENT, null);
 DOMNode node;
 while ((node = tw.nextNode()) != null) {
        someNodeProcessing(node);
 }

Serializing the DOM

The class DOMWriter can be used to pretty-print a document or a subtree. To do that, it takes into account the default values of the display CSS property for the elements, according to the user agent's default style sheet. Also allows to replace a specified subset of codepoints with the proper entity references, when serializing a Text node.

Related Packages

Module

Package

Description

io.sf.carte.css4j

io.sf.carte.doc

Basic classes and interfaces used by documents.

io.sf.carte.css4j

io.sf.carte.doc.agent

User agent classes.

io.sf.carte.css4j

io.sf.carte.doc.color

Color classes for use in CSS, HTML, SVG, etc.

io.sf.carte.css4j.dom4j

io.sf.carte.doc.dom4j

Built on top of the DOM4J package, provides XHTML parsing with built-in support for CSS style sheets.

io.sf.carte.css4j

io.sf.carte.doc.geom

Interfaces and classes related to W3C's Geometry Interfaces Module.
Class

Description

AttributeNamedNodeMap

Attribute NamedNodeMap.

CSSDOMImplementation

CSS-enabled DOM implementation.

DOMDocument

Implementation of a DOM Document.

DOMElement

A bare DOM element node.

DOMNode

DOM Node.

DOMNodeList

DOMNode-specific version of NodeList.

DOMWriter

Serializes a node and its subtree.

ElementList

DOMElement-specific ExtendedNodeList.

ExtendedNamedNodeMap<T extends Node>

Extended NamedNodeMap.

ExtendedNodeList<T extends Node>

This library's iterable version of the old NodeList.

HTMLDocument

HTML Document.

HTMLElement

HTML-specific element nodes.

NodeFilter

Filter the nodes returned by an iterator, see ParentNode.iterator(NodeFilter), ParentNode.iterator(int, NodeFilter) and DOMDocument.createNodeIterator(Node, int, NodeFilter).

NodeIterator

Iterates over the document nodes according to a set of parameters.

NodeListIterator

A ListIterator that has Node arguments but returns DOMNode references.

NonDocumentTypeChildNode

Based on W3C's NonDocumentTypeChildNode interface.

ParentNode

Based on W3C's ParentNode interface.

TreeWalker

Traverse the document's nodes according to a set of parameters.

XMLDocumentBuilder

Generic DocumentBuilder for XML documents.

Package io.sf.carte.doc.dom

Traversing the DOM

Serializing the DOM