Package io.sf.carte.doc.dom
The following behavior is believed to be more user-friendly from the point of view of a developer that is handling an HTML document, but is non-conformant:
- On elements and attributes,
Node.getLocalName()
returns the tag name instead ofnull
, when the node was created with a DOM Level 1 method such as Document.createElement(). In HTML documents, all the elements have implicitly the HTML namespace unless they have a different one. - As all the HTML elements have an implicit namespace and the idea is to
handle HTML and XHTML in the same way,
DOMElement.getTagName()
does not return an upper-cased name. - Entity references are allowed as a last-resort solution in case that an entity is unknown. No known current parser uses that, though. This limited support for entity references may be dropped in future versions.
- The class list obtained by
getClassList()
is not read-only: changes to it are reflected in the attribute, and vice-versa. - Calling
normalize()
on aSTYLE
element sets its text content to the contents of the associated style sheet. - The order of the element attributes is as specified, while other
implementations like the one shipped with most JDKs (
Xerces-j
) do not enforce any particular order. - By default, not-
specified
attributes are not set, omitting the default value if any.
Traversing the DOM
There are several alternative procedures to retrieve the child nodes of a parent node. The most straightforward is also the fastest: get the first (or last) child, and then iterate through the next (or previous) siblings:
DOMNode node = getFirstChild(); while (node != null) { someNodeProcessing(node); // do something with that node node = node.getNextSibling(); }
or, if you are used to for
loops:
for (DOMNode node = getFirstChild(); node != null; node = node.getNextSibling()) { someNodeProcessing(node); // do something with that node }
The iterators are also fast:
Iterator<DOMNode> it = parentNode.iterator(); while (it.hasNext()) { DOMNode node = it.next(); someNodeProcessing(node); // do something with that node }
There are several different iterators, like the elementIterator
:
Iterator<DOMElement> it = parentNode.elementIterator(); while (it.hasNext()) { DOMElement element = it.next(); someElementProcessing(element); // do something with that element }
or the typeIterator
:
Iterator<Node> it = parentNode.typeIterator(Node.PROCESSING_INSTRUCTION_NODE); while (it.hasNext()) { ProcessingInstruction pi = (ProcessingInstruction) it.next(); someProcessing(pi); // do something with that processing instruction }
Finally, the old NodeList
interface, which in
this library is implemented in the more modern flavours of
DOMNodeList
and
ElementList
:
NodeList list = parentNode.getChildNodes(); for (int i = 0; i < list.getLength(); i++) { Node node = list.item(i); someNodeProcessing(node); // do something with that node }
which is a less efficient way to examine the child nodes, but still useful.
Using it as an Iterable
is more efficient (be sure to use
DOMNodeList
or ElementList
):
DOMNodeList list = parentNode.getChildNodes(); for (DOMNode node : list) { someNodeProcessing(node); }or:
ElementList list = parentNode.getChildren(); for (DOMElement element : list) { someElementProcessing(element); // Let's do something with the attributes AttributeNamedNodeMap attributes = element.getAttributes(); for (Attr attr : attributes) { attr.setValue("foo"); } }
To iterate across the document (as opposed to just the child nodes), there is
the
createNodeIterator(Node, int, NodeFilter)
method. For example:
NodeIterator it = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, null); while (it.hasNext()) { DOMElement element = (DOMElement) it.next(); someElementProcessing(element); // do something with that element }
This library's version of NodeIterator
implements ListIterator
.
And finally the TreeWalker
, which can
be created with the
createTreeWalker(Node, int, NodeFilter)
method:
TreeWalker tw = document.createTreeWalker(document, NodeFilter.SHOW_ELEMENT, null); DOMNode node; while ((node = tw.nextNode()) != null) { someNodeProcessing(node); }
Serializing the DOM
The class DOMWriter
can be used to
pretty-print a document or a subtree. To do that, it takes into account the
default values of the display
CSS property for the elements,
according to the user agent's default style sheet. Also allows to replace a
specified subset of codepoints with the proper entity references, when
serializing a Text
node.
-
ClassDescriptionAttribute NamedNodeMap.CSS-enabled DOM implementation.Implementation of a DOM
Document
.A bare DOM element node.DOM Node.DOMNode
-specific version ofNodeList
.Serializes a node and its subtree.DOMElement
-specificExtendedNodeList
.ExtendedNamedNodeMap<T extends Node>ExtendedNamedNodeMap
.ExtendedNodeList<T extends Node>This library's iterable version of the oldNodeList
.HTMLDocument
.HTML-specific element nodes.Filter the nodes returned by an iterator, seeParentNode.iterator(NodeFilter)
,ParentNode.iterator(int, NodeFilter)
andDOMDocument.createNodeIterator(Node, int, NodeFilter)
.Iterates over the document nodes according to a set of parameters.AListIterator
that hasNode
arguments but returnsDOMNode
references.Based on W3C'sNonDocumentTypeChildNode
interface.Based on W3C'sParentNode
interface.Traverse the document's nodes according to a set of parameters.GenericDocumentBuilder
for XML documents.