- All Implemented Interfaces:
EntityResolver
,EntityResolver2
Has common W3C DTDs/entities built-in and loads others via the supplied
SYSTEM
URL, provided that certain conditions are met:
- URL protocol is
http
/https
. - Either the mime type is valid for a DTD or entity, or the filename ends
with
.dtd
,.ent
or.mod
. - The whitelist is either disabled (no host added to it) or contains the host from the URL.
If the whitelist was enabled (e.g. default constructor), any attempt to download data from a remote URL not present in the whitelist is going to produce an exception. You can use that to determine whether your documents are referencing a DTD resource that is not bundled with this resolver.
If the constructor with a false
argument was used, the whitelist
can still be enabled by adding a hostname via
addHostToWhiteList(String)
.
Although this resolver should protect you from most information leaks (see
SSRF
attacks) and also from jar:
decompression bombs, DoS
attacks based on entity expansion/recursion like the
'billion laughs
attack' may still be possible and should be prevented at the XML parser.
Be sure to use a properly configured, recent version of your parser.
-
Constructor Summary
ConstructorDescriptionConstruct a resolver with the whitelist enabled.DefaultEntityResolver
(boolean enableWhitelist) Construct a resolver with the whitelist enabled or disabled according toenableWhitelist
. -
Method Summary
Modifier and TypeMethodDescriptionvoid
addHostToWhiteList
(String fqdn) Add the given host to a whitelist for remote DTD fetching.protected void
connect
(URLConnection con) Connect the givenURLConnection
.getExternalSubset
(String name, String baseURI) Allows applications to provide an external subset for documents that don't explicitly define one.protected boolean
isInvalidPath
(String path) Determine if the given path is considered invalid for a DTD.protected boolean
isInvalidProtocol
(String protocol) Is the given protocol not supported by this resolver ?protected boolean
isValidContentType
(String conType) Is the given string a valid DTD/entity content-type ?protected boolean
isWhitelistedHost
(String host) Is the given host whitelisted ?protected boolean
Is the whitelist enabled ?protected URLConnection
openConnection
(URL url) Open a connection to the given URL.protected boolean
registerSystemIdFilename
(String systemId, String filename) Register an internal classpath filename to retrieve a DTDSystemId
.resolveEntity
(String publicId, String systemId) Allow the application to resolve external entities.resolveEntity
(String name, String publicId, String baseURI, String systemId) Allows applications to map references to external entities into input sources.resolveEntity
(DocumentType dtDecl) Resolve external entities according to the givenDocumentType
.void
setClassLoader
(ClassLoader loader) Set the class loader to be used to read the built-in DTDs.
-
Constructor Details
-
DefaultEntityResolver
public DefaultEntityResolver()Construct a resolver with the whitelist enabled. -
DefaultEntityResolver
public DefaultEntityResolver(boolean enableWhitelist) Construct a resolver with the whitelist enabled or disabled according toenableWhitelist
.- Parameters:
enableWhitelist
- can befalse
to allow connecting to any host to retrieve DTDs or entities, ortrue
to enable the (empty) whitelist so no network connections are to be allowed until a host is added to it.
-
-
Method Details
-
addHostToWhiteList
Add the given host to a whitelist for remote DTD fetching.If the whitelist is enabled, only http or https URLs will be allowed.
- Parameters:
fqdn
- the fully qualified domain name to add to the whitelist.
-
getExternalSubset
Allows applications to provide an external subset for documents that don't explicitly define one.Documents with
DOCTYPE
declarations that omit an external subset can thus augment the declarations available for validation, entity processing, and attribute processing (normalization, defaulting, and reporting types includingID
). This augmentation is reported through thestartDTD()
method as if the document text had originally included the external subset; this callback is made before any internal subset data or errors are reported.This method can also be used with documents that have no
DOCTYPE
declaration. When the root element is encountered but noDOCTYPE
declaration has been seen, this method is invoked. If it returns a value for the external subset, that root element is declared to be the root element, giving the effect of splicing aDOCTYPE
declaration at the end the prolog of a document that could not otherwise be valid. The sequence of parser callbacks in that case logically resembles this:... comments and PIs from the prolog (as usual) startDTD ("rootName", source.getPublicId (), source.getSystemId ()); startEntity ("[dtd]"); ... declarations, comments, and PIs from the external subset endEntity ("[dtd]"); endDTD (); ... then the rest of the document (as usual) startElement (..., "rootName", ...);
Note that the
InputSource
gets no further resolution. Also, this method will never be used by a (non-validating) processor that is not including external parameter entities.Uses for this method include facilitating data validation when interoperating with XML processors that would always require undesirable network accesses for external entities, or which for other reasons adopt a "no DTDs" policy.
Warning: returning an external subset modifies the input document. By providing definitions for general entities, it can make a malformed document appear to be well formed.
- Specified by:
getExternalSubset
in interfaceEntityResolver2
- Parameters:
name
- Identifies the document root element. This name comes from aDOCTYPE
declaration (where available) or from the actual root element.baseURI
- The document's base URI, serving as an additional hint for selecting the external subset. This is always an absolute URI, unless it isnull
because theXMLReader
was given anInputSource
without one.- Returns:
- an
InputSource
object describing the new external subset to be used by the parser. If no specific subset could be determined, an input source describing the HTML5 entities is returned. - Throws:
SAXException
- if either the provided arguments or the input source were invalid or not allowed.IOException
- if an I/O problem was found while loading the input source.
-
registerSystemIdFilename
Register an internal classpath filename to retrieve a DTDSystemId
.- Parameters:
systemId
- theSystemId
.filename
- the internal filename. Must point to a resource withUTF-8
encoding.- Returns:
true
if the newSystemId
was successfully registered,false
if it was already registered.- Throws:
IllegalArgumentException
- if thefilename
is considered invalid byisInvalidPath(String)
.
-
resolveEntity
public InputSource resolveEntity(String name, String publicId, String baseURI, String systemId) throws SAXException, IOException Allows applications to map references to external entities into input sources.This method is only called for external entities which have been properly declared. It provides more flexibility than the
EntityResolver
interface, supporting implementations of more complex catalogue schemes such as the one defined by the OASIS XML Catalogs specification.Parsers configured to use this resolver method will call it to determine the input source to use for any external entity being included because of a reference in the XML text. That excludes the document entity, and any external entity returned by
getExternalSubset()
. When a (non-validating) processor is configured not to include a class of entities (parameter or general) through use of feature flags, this method is not invoked for such entities.If no valid input source could be determined, this method will throw a
SAXException
instead of returningnull
as other implementations would do. If you have to retrieve a DTD which is not directly provided by this resolver, you need to whitelist the host usingaddHostToWhiteList(String)
first. Make sure that either the systemId URL ends with a valid extension, or that the retrieved URL was served with a valid DTD media type.Note that the entity naming scheme used here is the same one used in the
LexicalHandler
, or in theContentHandler.skippedEntity()
method.- Specified by:
resolveEntity
in interfaceEntityResolver2
- Parameters:
name
- Identifies the external entity being resolved. Either "[dtd]
" for the external subset, or a name starting with "%
" to indicate a parameter entity, or else the name of a general entity. This is nevernull
when invoked by a SAX2 parser.publicId
- The public identifier of the external entity being referenced (normalized as required by the XML specification), ornull
if none was supplied.baseURI
- The URI with respect to which relative systemIDs are interpreted. This is always an absolute URI, unless it isnull
(likely because theXMLReader
was given anInputSource
without one). This URI is defined by the XML specification to be the one associated with the "<" starting the relevant declaration.systemId
- The system identifier of the external entity being referenced; either a relative or absolute URI.- Returns:
- an
InputSource
object describing the new input source to be used by the parser. This implementation never returnsnull
ifsystemId
is non-null
. - Throws:
SAXException
- if either the provided arguments or the input source were invalid or not allowed.IOException
- if an I/O problem was found while forming the URL to the input source, or when connecting to it.
-
isInvalidPath
Determine if the given path is considered invalid for a DTD.To be valid, must end with
.dtd
,.ent
or.mod
.- Parameters:
path
- the path to check.- Returns:
true
if the path is invalid for a DTD,false
otherwise.
-
isWhitelistEnabled
protected boolean isWhitelistEnabled()Is the whitelist enabled ?- Returns:
true
if the whitelist is enabled.
-
isInvalidProtocol
Is the given protocol not supported by this resolver ?Only
http
andhttps
are valid.- Parameters:
protocol
- the protocol.- Returns:
true
if this resolver considers the given protocol invalid.
-
isWhitelistedHost
Is the given host whitelisted ?- Parameters:
host
- the host to test.- Returns:
true
if the given host is whitelisted.
-
openConnection
Open a connection to the given URL.- Parameters:
url
- the URL to connect to.- Returns:
- the connection.
- Throws:
IOException
- if an I/O error happened opening the connection.
-
connect
Connect the givenURLConnection
.- Parameters:
con
- theURLConnection
.- Throws:
IOException
- if a problem happened connecting.
-
isValidContentType
Is the given string a valid DTD/entity content-type ?- Parameters:
conType
- the content-type.- Returns:
true
if it is a valid DTD/entity content-type
-
resolveEntity
Allow the application to resolve external entities.The parser will call this method before opening any external entity except the top-level document entity. Such entities include the external DTD subset and external parameter entities referenced within the DTD (in either case, only if the parser reads external parameter entities), and external general entities referenced within the document element (if the parser reads external general entities). The application may request that the parser locate the entity itself, that it use an alternative URI, or that it use data provided by the application (as a character or byte input stream).
If no valid input source could be determined, this method will throw a
SAXException
instead of returningnull
as other implementations would do. If you have to retrieve a DTD which is not directly provided by this resolver, you need to whitelist the host usingaddHostToWhiteList(String)
first. Make sure that either the systemId URL ends with a valid extension, or that the retrieved URL was served with a valid DTD media type.- Specified by:
resolveEntity
in interfaceEntityResolver
- Parameters:
publicId
- The public identifier of the external entity being referenced, ornull
if none was supplied.systemId
- The system identifier of the external entity being referenced.- Returns:
- an
InputSource
object describing the new input source. This implementation never returnsnull
ifsystemId
is non-null
. - Throws:
SAXException
- if either the provided arguments or the input source were invalid or not allowed.IOException
- if an I/O problem was found while forming the URL to the input source, or when connecting to it.
-
resolveEntity
Resolve external entities according to the givenDocumentType
.If no valid input source could be determined, this method will throw a
SAXException
instead of returningnull
as other implementations would do. If you have to retrieve a DTD which is not directly provided by this resolver, you need to whitelist the host usingaddHostToWhiteList(String)
first. Make sure that either the systemId URL ends with a valid extension, or that the retrieved URL was served with a valid DTD media type.- Parameters:
dtDecl
- theDocumentType
.- Returns:
- an
InputSource
object describing the new input source. - Throws:
SAXException
- if either the provided arguments or the input source were invalid or not allowed.IOException
- if an I/O problem was found while forming the URL to the input source, or when connecting to it.
-
setClassLoader
Set the class loader to be used to read the built-in DTDs.- Parameters:
loader
- the class loader.
-