java.lang.Object
io.sf.carte.uparser.TokenProducer
A simple parser that produces tokens from a String or Reader.
The tokens produced are:
- Words. Contains letters, digits, format characters (Unicode Cf),
connector punctuations, certain symbols and any codepoint in
allowInWords
or accepted by the suppliedCharacterCheck
object. - Quoted text (within single and double quotes).
- Grouping characters: {}[]().
- Separators.
- Escaped characters: anything after a backslash, unless it is within a quoted text.
- Control characters.
- Other characters.
- Comments. (if a comment-supporting method is used)
A moderate level of control of the parsing can be achieved with a
TokenControl
object.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic interface
Check whether a character codepoint would be a valid addition to a word.static interface
Basic access to the current sequence in this tokenizer. -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
*static final int
^static final int
:static final int
,static final int
@static final int
$static final int
=static final int
!static final int
.static final int
>static final int
-static final int
{static final int
(static final int
[static final int
<static final int
#static final int
%static final int
+static final int
?static final int
}static final int
)static final int
]static final int
;static final int
/static final int
~static final int
|static final byte
static final byte
static final byte
-
Constructor Summary
ConstructorDescriptionTokenProducer
(TokenHandler handler) Instantiate aTokenProducer
object with the given handler.TokenProducer
(TokenHandler handler, int characterCountLimit) Instantiate aTokenProducer
object with the given handler and processing limit.TokenProducer
(TokenHandler handler, int[] allowInWords) TokenProducer
(TokenHandler handler, int[] allowInWords, int characterCountLimit) Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.TokenProducer
(TokenHandler handler, TokenProducer.CharacterCheck charCheck) Instantiate aTokenProducer
object with the given handler andCharacterCheck
.TokenProducer
(TokenHandler handler, TokenProducer.CharacterCheck charCheck, int characterCountLimit) Instantiate aTokenProducer
object with the given handler andCharacterCheck
. -
Method Summary
Modifier and TypeMethodDescriptionvoid
void
void
Parse the given reader, with a single comment layout.void
void
void
parseMultiComment
(Reader reader, String[] opening, String[] closing) Parse the given reader, with multiple comment layouts.void
setAcceptEofEndingQuoted
(boolean accept) If set, quoted strings ending with an EOF (End Of File). are processed through the relevantquoted
method, albeit an error is reported in any case.void
setAcceptNewlineEndingQuote
(boolean accept) If set, quoted strings ending with an unescaped newline (instead of the closing quote) are processed through the relevantquoted
method, albeit an error is reported in any case.void
setHandleAllSeparators
(boolean handleAllSeparators) Set the handling of consecutive separators like whitespace or tabs.
-
Field Details
-
ERR_UNEXPECTED_END_QUOTED
public static final byte ERR_UNEXPECTED_END_QUOTED- See Also:
-
ERR_LASTCHAR_BACKSLASH
public static final byte ERR_LASTCHAR_BACKSLASH- See Also:
-
ERR_UNEXPECTED_END_COMMENTED
public static final byte ERR_UNEXPECTED_END_COMMENTED- See Also:
-
CHAR_EXCLAMATION
public static final int CHAR_EXCLAMATION!- See Also:
-
CHAR_NUMBER_SIGN
public static final int CHAR_NUMBER_SIGN#- See Also:
-
CHAR_DOLLAR
public static final int CHAR_DOLLAR$- See Also:
-
CHAR_PERCENT_SIGN
public static final int CHAR_PERCENT_SIGN%- See Also:
-
CHAR_LEFT_PAREN
public static final int CHAR_LEFT_PAREN(- See Also:
-
CHAR_RIGHT_PAREN
public static final int CHAR_RIGHT_PAREN)- See Also:
-
CHAR_ASTERISK
public static final int CHAR_ASTERISK*- See Also:
-
CHAR_PLUS
public static final int CHAR_PLUS+- See Also:
-
CHAR_COMMA
public static final int CHAR_COMMA,- See Also:
-
CHAR_HYPHEN_MINUS
public static final int CHAR_HYPHEN_MINUS-- See Also:
-
CHAR_FULL_STOP
public static final int CHAR_FULL_STOP.- See Also:
-
CHAR_SLASH
public static final int CHAR_SLASH/- See Also:
-
CHAR_COLON
public static final int CHAR_COLON:- See Also:
-
CHAR_SEMICOLON
public static final int CHAR_SEMICOLON;- See Also:
-
CHAR_LESS_THAN
public static final int CHAR_LESS_THAN<- See Also:
-
CHAR_EQUALS
public static final int CHAR_EQUALS=- See Also:
-
CHAR_GREATER_THAN
public static final int CHAR_GREATER_THAN>- See Also:
-
CHAR_QUESTION_MARK
public static final int CHAR_QUESTION_MARK?- See Also:
-
CHAR_COMMERCIAL_AT
public static final int CHAR_COMMERCIAL_AT@- See Also:
-
CHAR_LEFT_SQ_BRACKET
public static final int CHAR_LEFT_SQ_BRACKET[- See Also:
-
CHAR_RIGHT_SQ_BRACKET
public static final int CHAR_RIGHT_SQ_BRACKET]- See Also:
-
CHAR_CIRCUMFLEX_ACCENT
public static final int CHAR_CIRCUMFLEX_ACCENT^- See Also:
-
CHAR_LEFT_CURLY_BRACKET
public static final int CHAR_LEFT_CURLY_BRACKET{- See Also:
-
CHAR_VERTICAL_LINE
public static final int CHAR_VERTICAL_LINE|- See Also:
-
CHAR_RIGHT_CURLY_BRACKET
public static final int CHAR_RIGHT_CURLY_BRACKET}- See Also:
-
CHAR_TILDE
public static final int CHAR_TILDE~- See Also:
-
-
Constructor Details
-
TokenProducer
Instantiate aTokenProducer
object with the given handler.- Parameters:
handler
- the token handler.
-
TokenProducer
Instantiate aTokenProducer
object with the given handler and processing limit.- Parameters:
handler
- the token handler.characterCountLimit
- the character count limit.
-
TokenProducer
Instantiate aTokenProducer
object with the given handler andCharacterCheck
.- Parameters:
handler
- the token handler.charCheck
- the character checker object.
-
TokenProducer
public TokenProducer(TokenHandler handler, TokenProducer.CharacterCheck charCheck, int characterCountLimit) Instantiate aTokenProducer
object with the given handler andCharacterCheck
.- Parameters:
handler
- the token handler.charCheck
- the character checker object.characterCountLimit
- the character count limit.
-
TokenProducer
-
TokenProducer
Construct aTokenProducer
object with the given handler and an array of codepoints allowed in words.- Parameters:
handler
- the token handler.allowInWords
- the array of codepoints allowed in words.characterCountLimit
- the character count limit.
-
-
Method Details
-
setHandleAllSeparators
public void setHandleAllSeparators(boolean handleAllSeparators) Set the handling of consecutive separators like whitespace or tabs.Default is
true
.- Parameters:
handleAllSeparators
- if set totrue
, all separator characters (including consecutive ones) will trigger aTokenHandler.separator(int, int)
method call. Otherwise only single separations between the other types of tokens will be taken into account.
-
setAcceptNewlineEndingQuote
public void setAcceptNewlineEndingQuote(boolean accept) If set, quoted strings ending with an unescaped newline (instead of the closing quote) are processed through the relevantquoted
method, albeit an error is reported in any case. Otherwise, only the error is reported.It is set to
false
by default.- Parameters:
accept
- true to process quoted strings that ends with an unescaped newline, false otherwise.
-
setAcceptEofEndingQuoted
public void setAcceptEofEndingQuoted(boolean accept) If set, quoted strings ending with an EOF (End Of File). are processed through the relevantquoted
method, albeit an error is reported in any case. Otherwise, only the error is reported.It is set to
false
by default.- Parameters:
accept
- true to process quoted strings that ends with an EOF,false
otherwise.
-
parse
-
parse
-
parse
- Throws:
IOException
-
parse
- Throws:
IOException
-
parse
Parse the given reader, with a single comment layout.- Parameters:
reader
- the reader to parse.commentOpen
- the token that opens a comment. It is not allowed to repeat the same character at the beginning of the token. For example,<!--
is a valid token but<<--
would not.commentClose
- the token that closes a comment.- Throws:
IOException
- if an I/O problem was found parsing the reader.
-
parseMultiComment
Parse the given reader, with multiple comment layouts.- Parameters:
reader
- the reader to parse.opening
- the array of tokens that open a comment. It is not allowed to repeat the same character at the beginning of a token. For example,<!--
is a valid token but<<--
would not.closing
- the array of tokens that close the comment opened with theopening
at the same index.- Throws:
IOException
- if an I/O problem was found parsing the reader.
-