Module io.sf.carte.tokenproducer
Package io.sf.carte.uparser
Class TokenProducer3<E extends Exception>
java.lang.Object
io.sf.carte.uparser.TokenProducer3<E>
- Direct Known Subclasses:
TokenProducer
A simple parser that produces tokens from a String or Reader, and processes
them through a user-provided handler that may throw checked exceptions of
type
E
.
The tokens produced are:
- Words. Contains letters, digits, format characters (Unicode Cf),
connector punctuations, certain symbols and any codepoint in
allowInWords
or accepted by the suppliedCharacterCheck
object. - Quoted text (within single and double quotes).
- Grouping characters: {}[]().
- Separators.
- Escaped characters: anything after a backslash, unless it is within a quoted text.
- Control characters.
- Other characters.
- Comments. (if a comment-supporting method is used)
A moderate level of control of the parsing can be achieved with a
TokenControl
object.
The constructors that do not take a characterCountLimit
argument
process Reader
streams of up to one Gigabyte (0x40000000) in size,
throwing a SecurityException
if they exceed that limit. Use the other
constructors if you need to process larger files (or want smaller limits
being enforced).
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic interface
Check whether a character codepoint would be a valid addition to a word.static interface
TokenProducer3.SequenceParser<E extends Exception>
Basic access to the current sequence in this tokenizer. -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
*static final int
^static final int
:static final int
,static final int
@static final int
$static final int
=static final int
!static final int
.static final int
>static final int
-static final int
{static final int
(static final int
[static final int
<static final int
_static final int
#static final int
%static final int
+static final int
?static final int
}static final int
)static final int
]static final int
;static final int
/static final int
~static final int
|static final byte
static final byte
static final byte
static final byte
-
Constructor Summary
ConstructorDescriptionTokenProducer3
(TokenHandler3<E> handler) Instantiate aTokenProducer
object with the given handler.TokenProducer3
(TokenHandler3<E> handler, int characterCountLimit) Instantiate aTokenProducer
object with the given handler and processing limit.TokenProducer3
(TokenHandler3<E> handler, int[] allowInWords) Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.TokenProducer3
(TokenHandler3<E> handler, int[] allowInWords, int characterCountLimit) Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.TokenProducer3
(TokenHandler3<E> handler, TokenProducer3.CharacterCheck charCheck) Instantiate aTokenProducer
object with the given handler andCharacterCheck
.TokenProducer3
(TokenHandler3<E> handler, TokenProducer3.CharacterCheck charCheck, int characterCountLimit) Instantiate aTokenProducer
object with the given handler andCharacterCheck
. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Tokenize the contents of the givenReader
without any comment handling.void
Tokenize the contents of aReader
without any comment handling, using a buffer with the given initial capacity.void
Tokenize the given reader, with a single comment layout.void
Tokenize a string, without any comment handling.void
Tokenize a string, accounting for the given comment syntax.void
parseMultiComment
(Reader reader, String[] opening, String[] closing) Tokenize the given reader, with multiple comment layouts.void
setAcceptEofEndingQuoted
(boolean accept) If set, quoted strings ending with an EOF (End Of File). are processed through the relevantquoted
method, albeit an error is reported in any case.void
setAcceptNewlineEndingQuote
(boolean accept) If set, quoted strings ending with an unescaped newline (instead of the closing quote) are processed through the relevantquoted
method, albeit an error is reported in any case.void
setHandleAllSeparators
(boolean handleAllSeparators) Set the handling of consecutive separators like whitespace or tabs.
-
Field Details
-
ERR_UNEXPECTED_END_QUOTED
public static final byte ERR_UNEXPECTED_END_QUOTED- See Also:
-
ERR_LASTCHAR_BACKSLASH
public static final byte ERR_LASTCHAR_BACKSLASH- See Also:
-
ERR_UNEXPECTED_END_COMMENTED
public static final byte ERR_UNEXPECTED_END_COMMENTED- See Also:
-
ERR_UNEXPECTED_CONTROL
public static final byte ERR_UNEXPECTED_CONTROL- See Also:
-
CHAR_EXCLAMATION
public static final int CHAR_EXCLAMATION!- See Also:
-
CHAR_NUMBER_SIGN
public static final int CHAR_NUMBER_SIGN#- See Also:
-
CHAR_DOLLAR
public static final int CHAR_DOLLAR$- See Also:
-
CHAR_PERCENT_SIGN
public static final int CHAR_PERCENT_SIGN%- See Also:
-
CHAR_LEFT_PAREN
public static final int CHAR_LEFT_PAREN(- See Also:
-
CHAR_RIGHT_PAREN
public static final int CHAR_RIGHT_PAREN)- See Also:
-
CHAR_ASTERISK
public static final int CHAR_ASTERISK*- See Also:
-
CHAR_PLUS
public static final int CHAR_PLUS+- See Also:
-
CHAR_COMMA
public static final int CHAR_COMMA,- See Also:
-
CHAR_HYPHEN_MINUS
public static final int CHAR_HYPHEN_MINUS-- See Also:
-
CHAR_FULL_STOP
public static final int CHAR_FULL_STOP.- See Also:
-
CHAR_SLASH
public static final int CHAR_SLASH/- See Also:
-
CHAR_COLON
public static final int CHAR_COLON:- See Also:
-
CHAR_SEMICOLON
public static final int CHAR_SEMICOLON;- See Also:
-
CHAR_LESS_THAN
public static final int CHAR_LESS_THAN<- See Also:
-
CHAR_EQUALS
public static final int CHAR_EQUALS=- See Also:
-
CHAR_GREATER_THAN
public static final int CHAR_GREATER_THAN>- See Also:
-
CHAR_QUESTION_MARK
public static final int CHAR_QUESTION_MARK?- See Also:
-
CHAR_COMMERCIAL_AT
public static final int CHAR_COMMERCIAL_AT@- See Also:
-
CHAR_LEFT_SQ_BRACKET
public static final int CHAR_LEFT_SQ_BRACKET[- See Also:
-
CHAR_RIGHT_SQ_BRACKET
public static final int CHAR_RIGHT_SQ_BRACKET]- See Also:
-
CHAR_CIRCUMFLEX_ACCENT
public static final int CHAR_CIRCUMFLEX_ACCENT^- See Also:
-
CHAR_LOW_LINE
public static final int CHAR_LOW_LINE_- See Also:
-
CHAR_LEFT_CURLY_BRACKET
public static final int CHAR_LEFT_CURLY_BRACKET{- See Also:
-
CHAR_VERTICAL_LINE
public static final int CHAR_VERTICAL_LINE|- See Also:
-
CHAR_RIGHT_CURLY_BRACKET
public static final int CHAR_RIGHT_CURLY_BRACKET}- See Also:
-
CHAR_TILDE
public static final int CHAR_TILDE~- See Also:
-
-
Constructor Details
-
TokenProducer3
Instantiate aTokenProducer
object with the given handler.- Parameters:
handler
- the token handler.
-
TokenProducer3
Instantiate aTokenProducer
object with the given handler and processing limit.- Parameters:
handler
- the token handler.characterCountLimit
- the character count limit.
-
TokenProducer3
Instantiate aTokenProducer
object with the given handler andCharacterCheck
.- Parameters:
handler
- the token handler.charCheck
- the character checker object.
-
TokenProducer3
public TokenProducer3(TokenHandler3<E> handler, TokenProducer3.CharacterCheck charCheck, int characterCountLimit) Instantiate aTokenProducer
object with the given handler andCharacterCheck
.- Parameters:
handler
- the token handler.charCheck
- the character checker object.characterCountLimit
- the character count limit.
-
TokenProducer3
Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.- Parameters:
handler
- the token handler.allowInWords
- the array of codepoints allowed in words.
-
TokenProducer3
Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.- Parameters:
handler
- the token handler.allowInWords
- the array of codepoints allowed in words.characterCountLimit
- the character count limit.
-
-
Method Details
-
setHandleAllSeparators
public void setHandleAllSeparators(boolean handleAllSeparators) Set the handling of consecutive separators like whitespace or tabs.Default is
true
.- Parameters:
handleAllSeparators
- if set totrue
, all separator characters (including consecutive ones) will trigger aTokenHandler3.separator(int, int)
method call. Otherwise only single separations between the other types of tokens will be taken into account.
-
setAcceptNewlineEndingQuote
public void setAcceptNewlineEndingQuote(boolean accept) If set, quoted strings ending with an unescaped newline (instead of the closing quote) are processed through the relevantquoted
method, albeit an error is reported in any case. Otherwise, only the error is reported.It is set to
false
by default.- Parameters:
accept
- true to process quoted strings that ends with an unescaped newline, false otherwise.
-
setAcceptEofEndingQuoted
public void setAcceptEofEndingQuoted(boolean accept) If set, quoted strings ending with an EOF (End Of File). are processed through the relevantquoted
method, albeit an error is reported in any case. Otherwise, only the error is reported.It is set to
false
by default.- Parameters:
accept
- true to process quoted strings that ends with an EOF,false
otherwise.
-
parse
Tokenize a string, without any comment handling.- Parameters:
string
- the string to parse.- Throws:
E
NullPointerException
- if the string isnull
.
-
parse
Tokenize a string, accounting for the given comment syntax.- Parameters:
string
- the string to parse.commentOpen
- the token that opens a comment, for example/*
. It is not allowed to repeat the same character at the beginning of the token. For example,<!--
is a valid token but<<--
would not.commentClose
- the token closing a comment, for example*/
- Throws:
E
NullPointerException
- if any of the string arguments arenull
.
-
parse
Tokenize the contents of the givenReader
without any comment handling.A buffer with an initial default capacity of 256 will be used.
- Parameters:
reader
- theReader
whose contents are to be parsed.- Throws:
IOException
- if an I/O problem occurs with theReader
.E
-
parse
Tokenize the contents of aReader
without any comment handling, using a buffer with the given initial capacity.- Parameters:
reader
- theReader
whose contents are to be parsed.bufferCapacity
- the initial buffer capacity.- Throws:
IOException
- if an I/O problem occurs with theReader
.E
-
parse
Tokenize the given reader, with a single comment layout.- Parameters:
reader
- the reader to parse.commentOpen
- the token that opens a comment. It is not allowed to repeat the same character at the beginning of the token. For example,<!--
is a valid token but<<--
would not.commentClose
- the token that closes a comment.- Throws:
IOException
- if an I/O problem occurs.NullPointerException
- if any of the string arguments arenull
.E
-
parseMultiComment
public void parseMultiComment(Reader reader, String[] opening, String[] closing) throws E, IOException Tokenize the given reader, with multiple comment layouts.- Parameters:
reader
- the reader to parse.opening
- the array of tokens that open a comment. It is not allowed to repeat the same character at the beginning of a token. For example,<!--
is a valid token but<<--
would not.closing
- the array of tokens that close the comment opened with theopening
at the same index.- Throws:
IOException
- if an I/O problem was found parsing the reader.IllegalArgumentException
- if the opening and closing arrays do not have the same length.E
-