Package io.sf.carte.doc.dom
The following behavior is believed to be more user-friendly from the point of view of a developer that is handling an HTML document, but is non-conformant:
- On elements and attributes,
Node.getLocalName()returns the tag name instead ofnull, when the node was created with a DOM Level 1 method such as Document.createElement(). In HTML documents, all the elements have implicitly the HTML namespace unless they have a different one. - As all the HTML elements have an implicit namespace and the idea is to
handle HTML and XHTML in the same way,
DOMElement.getTagName()does not return an upper-cased name. - Entity references are allowed as a last-resort solution in case that an entity is unknown. No known current parser uses that, though. This limited support for entity references may be dropped in future versions.
- The class list obtained by
getClassList()is not read-only: changes to it are reflected in the attribute, and vice-versa. - Calling
normalize()on aSTYLEelement sets its text content to the contents of the associated style sheet. - The order of the element attributes is as specified, while other
implementations like the one shipped with most JDKs (
Xerces-j) do not enforce any particular order. - By default, not-
specifiedattributes are not set, omitting the default value if any.
Traversing the DOM
There are several alternative procedures to retrieve the child nodes of a parent node. The most straightforward is also the fastest: get the first (or last) child, and then iterate through the next (or previous) siblings:
DOMNode node = getFirstChild();
while (node != null) {
someNodeProcessing(node); // do something with that node
node = node.getNextSibling();
}
or, if you are used to for loops:
for (DOMNode node = getFirstChild(); node != null; node = node.getNextSibling()) {
someNodeProcessing(node); // do something with that node
}
The iterators are also fast:
Iterator<DOMNode> it = parentNode.iterator();
while (it.hasNext()) {
DOMNode node = it.next();
someNodeProcessing(node); // do something with that node
}
There are several different iterators, like the elementIterator:
Iterator<DOMElement> it = parentNode.elementIterator();
while (it.hasNext()) {
DOMElement element = it.next();
someElementProcessing(element); // do something with that element
}
or the typeIterator:
Iterator<Node> it = parentNode.typeIterator(Node.PROCESSING_INSTRUCTION_NODE);
while (it.hasNext()) {
ProcessingInstruction pi = (ProcessingInstruction) it.next();
someProcessing(pi); // do something with that processing instruction
}
Finally, the old NodeList interface, which in
this library is implemented in the more modern flavours of
DOMNodeList and
ElementList:
NodeList list = parentNode.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
someNodeProcessing(node); // do something with that node
}
which is a less efficient way to examine the child nodes, but still useful.
Using it as an Iterable is more efficient (be sure to use
DOMNodeList or ElementList):
DOMNodeList list = parentNode.getChildNodes();
for (DOMNode node : list) {
someNodeProcessing(node);
}
or:
ElementList list = parentNode.getChildren();
for (DOMElement element : list) {
someElementProcessing(element);
// Let's do something with the attributes
AttributeNamedNodeMap attributes = element.getAttributes();
for (Attr attr : attributes) {
attr.setValue("foo");
}
}
To iterate across the document (as opposed to just the child nodes), there is
the
createNodeIterator(Node, int, NodeFilter) method. For example:
NodeIterator it = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, null);
while (it.hasNext()) {
DOMElement element = (DOMElement) it.next();
someElementProcessing(element); // do something with that element
}
This library's version of NodeIterator implements ListIterator.
And finally the TreeWalker, which can
be created with the
createTreeWalker(Node, int, NodeFilter) method:
TreeWalker tw = document.createTreeWalker(document, NodeFilter.SHOW_ELEMENT, null);
DOMNode node;
while ((node = tw.nextNode()) != null) {
someNodeProcessing(node);
}
Serializing the DOM
The class DOMWriter can be used to
pretty-print a document or a subtree. To do that, it takes into account the
default values of the display CSS property for the elements,
according to the user agent's default style sheet. Also allows to replace a
specified subset of codepoints with the proper entity references, when
serializing a Text node.
-
ClassDescriptionAttribute NamedNodeMap.CSS-enabled DOM implementation.Implementation of a DOM
Document.A bare DOM element node.DOM Node.DOMNode-specific version ofNodeList.Serializes a node and its subtree.DOMElement-specificExtendedNodeList.ExtendedNamedNodeMap<T extends Node>ExtendedNamedNodeMap.ExtendedNodeList<T extends Node>This library's iterable version of the oldNodeList.HTMLDocument.HTML-specific element nodes.Filter the nodes returned by an iterator, seeParentNode.iterator(NodeFilter),ParentNode.iterator(int, NodeFilter)andDOMDocument.createNodeIterator(Node, int, NodeFilter).Iterates over the document nodes according to a set of parameters.AListIteratorthat hasNodearguments but returnsDOMNodereferences.Based on W3C'sNonDocumentTypeChildNodeinterface.Based on W3C'sParentNodeinterface.Traverse the document's nodes according to a set of parameters.GenericDocumentBuilderfor XML documents.