Package io.sf.carte.doc.dom
The following behavior is believed to be more user-friendly from the point of view of a developer that is handling an HTML document, but is non-conformant:
- On elements and attributes,
Node.getLocalName()
returns the tag name instead ofnull
, when the node was created with a DOM Level 1 method such as Document.createElement(). In HTML documents, all the elements have implicitly the HTML namespace unless they have a different one. - As all the HTML elements have an implicit namespace and the idea is to
handle HTML and XHTML in the same way,
DOMElement.getTagName()
does not return an upper-cased name. - The methods
Element.setIdAttribute
,Element.setIdAttributeNS
andElement.setIdAttributeNode
are now deprecated by W3C, but they do work in this implementation. In HTML documents, only case changes to the 'id' attribute (like 'ID' or 'Id') are allowed, and any change has Document-wide effects (according to the HTML specification, there is only one ID attribute in HTML). - Entity references are allowed as a last-resort solution in case that an entity is unknown. No known current parser uses that, though. This limited support for entity references may be dropped in future versions.
- The class list obtained by
getClassList()
is not read-only: changes to it are reflected in the attribute, and vice-versa. - Calling
normalize()
on aSTYLE
element sets its text content to the contents of the associated style sheet.
Traversing the DOM
There are several alternative procedures to retrieve the child nodes of a parent node. The most straightforward is also the fastest: get the first (or last) child, and then iterate through the next (or previous) siblings:
DOMNode node = getFirstChild(); while (node != null) { someNodeProcessing(node); // do something with that node node = node.getNextSibling(); }
or, if you are used to for
loops:
for (DOMNode node = getFirstChild(); node != null; node = node.getNextSibling()) { someNodeProcessing(node); // do something with that node }
The iterators are also fast:
Iterator<DOMNode> it = parentNode.iterator(); while (it.hasNext()) { DOMNode node = it.next(); someNodeProcessing(node); // do something with that node }
There are several different iterators, like the elementIterator
:
Iterator<DOMElement> it = parentNode.elementIterator(); while (it.hasNext()) { DOMElement element = it.next(); someElementProcessing(element); // do something with that element }
or the typeIterator
:
Iterator<Node> it = parentNode.typeIterator(Node.PROCESSING_INSTRUCTION_NODE); while (it.hasNext()) { ProcessingInstruction pi = (ProcessingInstruction) it.next(); someProcessing(pi); // do something with that processing instruction }
Finally, the old NodeList
interface, which in
this library is implemented in the more modern flavours of
DOMNodeList
and
ElementList
:
NodeList list = parentNode.getChildNodes(); for (int i = 0; i < list.getLength(); i++) { Node node = list.item(i); someNodeProcessing(node); // do something with that node }
which is a less efficient way to examine the child nodes, but still useful.
Using it as an Iterable
is more efficient (be sure to use
DOMNodeList
or ElementList
):
DOMNodeList list = parentNode.getChildNodes(); for (DOMNode node : list) { someNodeProcessing(node); }or:
ElementList list = parentNode.getChildren(); for (DOMElement element : list) { someElementProcessing(element); // Let's do something with the attributes AttributeNamedNodeMap attributes = element.getAttributes(); for (Attr attr : attributes) { attr.setValue("foo"); } }
To iterate across the document (as opposed to just the child nodes), there is
the
createNodeIterator(Node, int, NodeFilter)
method. For example:
NodeIterator it = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, null); while (it.hasNext()) { DOMElement element = (DOMElement) it.next(); someElementProcessing(element); // do something with that element }
This library's version of NodeIterator
implements ListIterator
.
And finally the TreeWalker
, which can
be created with the
createTreeWalker(Node, int, NodeFilter)
method:
TreeWalker tw = document.createTreeWalker(document, NodeFilter.SHOW_ELEMENT, null); DOMNode node; while ((node = tw.nextNode()) != null) { someNodeProcessing(node); }
Serializing the DOM
The class DOMWriter
can be used to
pretty-print a document or a subtree. To do that, it takes into account the
default values of the display
CSS property for the elements,
according to the user agent's default style sheet. Also allows to replace a
specified subset of codepoints with the proper entity references, when
serializing a Text
node.
-
ClassDescriptionAttribute NamedNodeMap.CSS-enabled DOM implementation.Implementation of a DOM
Document
.A bare DOM element node.DOM Node.DOMNode
-specific version ofNodeList
.Serializes a node and its subtree.DOMElement
-specificExtendedNodeList
.ExtendedNamedNodeMap<T extends Node>ExtendedNamedNodeMap
.ExtendedNodeList<T extends Node>This library's iterable version of the oldNodeList
.HTMLDocument
.HTML-specific element nodes.Filter the nodes returned by an iterator, seeParentNode.iterator(NodeFilter)
,ParentNode.iterator(int, NodeFilter)
andDOMDocument.createNodeIterator(Node, int, NodeFilter)
.Iterates over the document nodes according to a set of parameters.AListIterator
that hasNode
arguments but returnsDOMNode
references.Based on W3C'sNonDocumentTypeChildNode
interface.Based on W3C'sParentNode
interface.Traverse the document's nodes according to a set of parameters.GenericDocumentBuilder
for XML documents.