A simple parser that produces tokens from a String or Reader, and processes
them through a user-provided handler.
This parser is intended to deal with handlers that produce only runtime
(unchecked) exceptions. If your use case requires dealing with checked
exceptions, please use TokenProducer3
instead.
Tokenization Overview
The tokens produced are:
- Words. Contains letters, digits, format characters (Unicode Cf),
connector punctuations, certain symbols and any codepoint in
allowInWords
or accepted by the suppliedCharacterCheck
object. - Quoted text (within single and double quotes).
- Grouping characters: {}[]().
- Separators.
- Escaped characters: anything after a backslash, unless it is within a quoted text.
- Control characters.
- Other characters.
- Comments. (if a comment-supporting method is used)
A moderate level of control of the parsing can be achieved with a
TokenControl
object.
The constructors that do not take a characterCountLimit
argument
process Reader
streams of up to one Gigabyte (0x40000000) in size,
throwing a SecurityException
if they exceed that limit. Use the other
constructors if you need to process larger files (or want smaller limits
being enforced).
-
Nested Class Summary
Nested classes/interfaces inherited from class io.sf.carte.uparser.TokenProducer3
TokenProducer3.CharacterCheck, TokenProducer3.SequenceParser<E extends Exception>
-
Field Summary
Fields inherited from class io.sf.carte.uparser.TokenProducer3
CHAR_ASTERISK, CHAR_CIRCUMFLEX_ACCENT, CHAR_COLON, CHAR_COMMA, CHAR_COMMERCIAL_AT, CHAR_DOLLAR, CHAR_EQUALS, CHAR_EXCLAMATION, CHAR_FULL_STOP, CHAR_GREATER_THAN, CHAR_HYPHEN_MINUS, CHAR_LEFT_CURLY_BRACKET, CHAR_LEFT_PAREN, CHAR_LEFT_SQ_BRACKET, CHAR_LESS_THAN, CHAR_LOW_LINE, CHAR_NUMBER_SIGN, CHAR_PERCENT_SIGN, CHAR_PLUS, CHAR_QUESTION_MARK, CHAR_RIGHT_CURLY_BRACKET, CHAR_RIGHT_PAREN, CHAR_RIGHT_SQ_BRACKET, CHAR_SEMICOLON, CHAR_SLASH, CHAR_TILDE, CHAR_VERTICAL_LINE, ERR_LASTCHAR_BACKSLASH, ERR_UNEXPECTED_CONTROL, ERR_UNEXPECTED_END_COMMENTED, ERR_UNEXPECTED_END_QUOTED
-
Constructor Summary
ConstructorDescriptionTokenProducer
(TokenHandler2 handler) Instantiate aTokenProducer
object with the given handler.TokenProducer
(TokenHandler2 handler, int characterCountLimit) Instantiate aTokenProducer
object with the given handler and processing limit.TokenProducer
(TokenHandler2 handler, int[] allowInWords) Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.TokenProducer
(TokenHandler2 handler, int[] allowInWords, int characterCountLimit) Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.TokenProducer
(TokenHandler2 handler, TokenProducer3.CharacterCheck charCheck) Instantiate aTokenProducer
object with the given handler andCharacterCheck
.TokenProducer
(TokenHandler2 handler, TokenProducer3.CharacterCheck charCheck, int characterCountLimit) Instantiate aTokenProducer
object with the given handler andCharacterCheck
. -
Method Summary
Methods inherited from class io.sf.carte.uparser.TokenProducer3
parse, parse, parse, parse, parse, parseMultiComment, setAcceptEofEndingQuoted, setAcceptNewlineEndingQuote, setHandleAllSeparators
-
Constructor Details
-
TokenProducer
Instantiate aTokenProducer
object with the given handler.- Parameters:
handler
- the token handler.
-
TokenProducer
Instantiate aTokenProducer
object with the given handler and processing limit.- Parameters:
handler
- the token handler.characterCountLimit
- the character count limit.
-
TokenProducer
Instantiate aTokenProducer
object with the given handler andCharacterCheck
.- Parameters:
handler
- the token handler.charCheck
- the character checker object.
-
TokenProducer
public TokenProducer(TokenHandler2 handler, TokenProducer3.CharacterCheck charCheck, int characterCountLimit) Instantiate aTokenProducer
object with the given handler andCharacterCheck
.- Parameters:
handler
- the token handler.charCheck
- the character checker object.characterCountLimit
- the character count limit.
-
TokenProducer
Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.- Parameters:
handler
- the token handler.allowInWords
- the array of codepoints allowed in words.
-
TokenProducer
Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.- Parameters:
handler
- the token handler.allowInWords
- the array of codepoints allowed in words.characterCountLimit
- the character count limit.
-