java.lang.Object
io.sf.carte.uparser.CommentRemovalHandler
- All Implemented Interfaces:
TokenHandler2
,TokenHandler3<RuntimeException>
A handler that removes comments.
Example:
String removeComments(String text) {
String[] opening = { "/*", "<!--" };
String[] closing = { "*/", "-->" };
CommentRemovalHandler handler = new CommentRemovalHandler(text.length());
TokenProducer tp = new TokenProducer(handler);
try {
tp.parseMultiComment(new StringReader(text), opening, closing);
} catch (IOException e) {
}
return handler.getBuffer().toString();
}
-
Constructor Summary
ConstructorDescriptionCommentRemovalHandler
(int bufSize) Construct the handler with the given initial buffer size. -
Method Summary
Modifier and TypeMethodDescriptionvoid
character
(int index, int codePoint) Other characters including punctuation (excluding connector punctuation) and symbols (Sc, Sm and Sk unicode categories) was found, that was not one of the non-alphanumeric characters allowed in words.void
A commented string was found by the parser.void
control
(int index, int codePoint) A control character codepoint was found.void
endOfStream
(int len) The stream that was being parsed reached its end.void
error
(int index, byte errCode, CharSequence context) An error was found while parsing.void
escaped
(int index, int codePoint) A codepoint preceded with a backslash was found outside of quoted text.Get the buffer.void
leftCurlyBracket
(int index) Called when the{
codepoint is found.void
leftParenthesis
(int index) Called when the(
codepoint is found.void
leftSquareBracket
(int index) Called when the[
codepoint is found.void
quoted
(int index, CharSequence quoted, int quoteCp) A quoted string was found by the parser.void
quotedNewlineChar
(int index, int codePoint) An unescaped FF/LF/CR control was found while assembling a quoted string.void
quotedWithControl
(int index, CharSequence quoted, int quoteCp) A quoted string was found by the parser, and contains control characters.void
rightCurlyBracket
(int index) Called when the}
codepoint is found.void
rightParenthesis
(int index) Called when the)
codepoint is found.void
rightSquareBracket
(int index) Called when the]
codepoint is found.void
separator
(int index, int codePoint) A separator (Zs, Zl and Zp unicode categories) was found.void
tokenStart
(TokenControl control) At the beginning of parsing, this method is called, passing theTokenControl
object that can be used to fine-control the parsing.void
word
(int index, CharSequence word) A word was found by the parser (includes connector punctuation).Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface io.sf.carte.uparser.TokenHandler2
endPunctuation, startPunctuation
-
Constructor Details
-
CommentRemovalHandler
public CommentRemovalHandler(int bufSize) Construct the handler with the given initial buffer size.- Parameters:
bufSize
- the initial buffer size.
-
-
Method Details
-
getBuffer
Get the buffer.- Returns:
- the buffer.
-
tokenStart
Description copied from interface:TokenHandler2
At the beginning of parsing, this method is called, passing theTokenControl
object that can be used to fine-control the parsing.- Specified by:
tokenStart
in interfaceTokenHandler2
- Specified by:
tokenStart
in interfaceTokenHandler3<RuntimeException>
- Parameters:
control
- theTokenControl
object in charge of parsing.
-
word
Description copied from interface:TokenHandler2
A word was found by the parser (includes connector punctuation).- Specified by:
word
in interfaceTokenHandler2
- Specified by:
word
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the word was found.word
- the word.
-
separator
public void separator(int index, int codePoint) Description copied from interface:TokenHandler2
A separator (Zs, Zl and Zp unicode categories) was found.- Specified by:
separator
in interfaceTokenHandler2
- Specified by:
separator
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the separator was found.codePoint
- the codepoint of the found separator.
-
quoted
Description copied from interface:TokenHandler2
A quoted string was found by the parser.- Specified by:
quoted
in interfaceTokenHandler2
- Specified by:
quoted
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the quoted string was found.quoted
- the quoted sequence of characters, without the quotes.quoteCp
- the quote character.
-
quotedWithControl
Description copied from interface:TokenHandler2
A quoted string was found by the parser, and contains control characters.- Specified by:
quotedWithControl
in interfaceTokenHandler2
- Specified by:
quotedWithControl
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the quoted string was found.quoted
- the quoted sequence of characters, without the quotes.quoteCp
- the quote character codepoint.
-
quotedNewlineChar
public void quotedNewlineChar(int index, int codePoint) Description copied from interface:TokenHandler2
An unescaped FF/LF/CR control was found while assembling a quoted string.- Specified by:
quotedNewlineChar
in interfaceTokenHandler2
- Specified by:
quotedNewlineChar
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the control was found.codePoint
- the FF/LF/CR codepoint.
-
leftParenthesis
public void leftParenthesis(int index) Description copied from interface:TokenHandler2
Called when the(
codepoint is found.- Specified by:
leftParenthesis
in interfaceTokenHandler2
- Specified by:
leftParenthesis
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the codepoint was found.
-
leftSquareBracket
public void leftSquareBracket(int index) Description copied from interface:TokenHandler2
Called when the[
codepoint is found.- Specified by:
leftSquareBracket
in interfaceTokenHandler2
- Specified by:
leftSquareBracket
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the codepoint was found.
-
leftCurlyBracket
public void leftCurlyBracket(int index) Description copied from interface:TokenHandler2
Called when the{
codepoint is found.- Specified by:
leftCurlyBracket
in interfaceTokenHandler2
- Specified by:
leftCurlyBracket
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the codepoint was found.
-
rightParenthesis
public void rightParenthesis(int index) Description copied from interface:TokenHandler2
Called when the)
codepoint is found.- Specified by:
rightParenthesis
in interfaceTokenHandler2
- Specified by:
rightParenthesis
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the codepoint was found.
-
rightSquareBracket
public void rightSquareBracket(int index) Description copied from interface:TokenHandler2
Called when the]
codepoint is found.- Specified by:
rightSquareBracket
in interfaceTokenHandler2
- Specified by:
rightSquareBracket
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the codepoint was found.
-
rightCurlyBracket
public void rightCurlyBracket(int index) Description copied from interface:TokenHandler2
Called when the}
codepoint is found.- Specified by:
rightCurlyBracket
in interfaceTokenHandler2
- Specified by:
rightCurlyBracket
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the codepoint was found.
-
character
public void character(int index, int codePoint) Description copied from interface:TokenHandler2
Other characters including punctuation (excluding connector punctuation) and symbols (Sc, Sm and Sk unicode categories) was found, that was not one of the non-alphanumeric characters allowed in words.Symbols in So category are considered part of words and won't be handled by this method.
- Specified by:
character
in interfaceTokenHandler2
- Specified by:
character
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the punctuation was found.codePoint
- the codepoint of the found punctuation.
-
escaped
public void escaped(int index, int codePoint) Description copied from interface:TokenHandler2
A codepoint preceded with a backslash was found outside of quoted text.- Specified by:
escaped
in interfaceTokenHandler2
- Specified by:
escaped
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the escaped codepoint was found.codePoint
- the escaped codepoint.
-
control
public void control(int index, int codePoint) Description copied from interface:TokenHandler2
A control character codepoint was found.- Specified by:
control
in interfaceTokenHandler2
- Specified by:
control
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the control codepoint was found.codePoint
- the control codepoint.
-
commented
Description copied from interface:TokenHandler2
A commented string was found by the parser.- Specified by:
commented
in interfaceTokenHandler2
- Specified by:
commented
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the commented string was found.commentType
- the type of comment.comment
- the commented string.
-
endOfStream
public void endOfStream(int len) Description copied from interface:TokenHandler2
The stream that was being parsed reached its end.- Specified by:
endOfStream
in interfaceTokenHandler2
- Specified by:
endOfStream
in interfaceTokenHandler3<RuntimeException>
- Parameters:
len
- the length of the processed stream.
-
error
Description copied from interface:TokenHandler2
An error was found while parsing.Something was found that broke the assumptions made by the parser, like an escape character at the end of the stream or an unmatched quote.
- Specified by:
error
in interfaceTokenHandler2
- Specified by:
error
in interfaceTokenHandler3<RuntimeException>
- Parameters:
index
- the index at which the error was found.errCode
- the error code.context
- a context sequence. If a string was parsed, it will contain up to 16 characters before and after the error.
-