Class CommentRemovalHandler

java.lang.Object
io.sf.carte.uparser.CommentRemovalHandler
All Implemented Interfaces:
TokenHandler2, TokenHandler3<RuntimeException>

public class CommentRemovalHandler extends Object implements TokenHandler2
A handler that removes comments.

Example:


 String removeComments(String text) {
     String[] opening = { "/*", "<!--" };
     String[] closing = { "*/", "-->" };
     CommentRemovalHandler handler = new CommentRemovalHandler(text.length());
     TokenProducer tp = new TokenProducer(handler);
     try {
         tp.parseMultiComment(new StringReader(text), opening, closing);
     } catch (IOException e) {
     }
     return handler.getBuffer().toString();
 }
 
  • Constructor Summary

    Constructors
    Constructor
    Description
    CommentRemovalHandler(int bufSize)
    Construct the handler with the given initial buffer size.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    character(int index, int codePoint)
    Other characters including punctuation (excluding connector punctuation) and symbols (Sc, Sm and Sk unicode categories) was found, that was not one of the non-alphanumeric characters allowed in words.
    void
    commented(int index, int commentType, String comment)
    A commented string was found by the parser.
    void
    control(int index, int codePoint)
    A control character codepoint was found.
    void
    endOfStream(int len)
    The stream that was being parsed reached its end.
    void
    error(int index, byte errCode, CharSequence context)
    An error was found while parsing.
    void
    escaped(int index, int codePoint)
    A codepoint preceded with a backslash was found outside of quoted text.
    Get the buffer.
    void
    leftCurlyBracket(int index)
    Called when the { codepoint is found.
    void
    leftParenthesis(int index)
    Called when the ( codepoint is found.
    void
    leftSquareBracket(int index)
    Called when the [ codepoint is found.
    void
    quoted(int index, CharSequence quoted, int quoteCp)
    A quoted string was found by the parser.
    void
    quotedNewlineChar(int index, int codePoint)
    An unescaped FF/LF/CR control was found while assembling a quoted string.
    void
    quotedWithControl(int index, CharSequence quoted, int quoteCp)
    A quoted string was found by the parser, and contains control characters.
    void
    rightCurlyBracket(int index)
    Called when the } codepoint is found.
    void
    rightParenthesis(int index)
    Called when the ) codepoint is found.
    void
    rightSquareBracket(int index)
    Called when the ] codepoint is found.
    void
    separator(int index, int codePoint)
    A separator (Zs, Zl and Zp unicode categories) was found.
    void
    At the beginning of parsing, this method is called, passing the TokenControl object that can be used to fine-control the parsing.
    void
    word(int index, CharSequence word)
    A word was found by the parser (includes connector punctuation).

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface io.sf.carte.uparser.TokenHandler2

    endPunctuation, startPunctuation
  • Constructor Details

    • CommentRemovalHandler

      public CommentRemovalHandler(int bufSize)
      Construct the handler with the given initial buffer size.
      Parameters:
      bufSize - the initial buffer size.
  • Method Details

    • getBuffer

      public StringBuilder getBuffer()
      Get the buffer.
      Returns:
      the buffer.
    • tokenStart

      public void tokenStart(TokenControl control)
      Description copied from interface: TokenHandler2
      At the beginning of parsing, this method is called, passing the TokenControl object that can be used to fine-control the parsing.
      Specified by:
      tokenStart in interface TokenHandler2
      Specified by:
      tokenStart in interface TokenHandler3<RuntimeException>
      Parameters:
      control - the TokenControl object in charge of parsing.
    • word

      public void word(int index, CharSequence word)
      Description copied from interface: TokenHandler2
      A word was found by the parser (includes connector punctuation).
      Specified by:
      word in interface TokenHandler2
      Specified by:
      word in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the word was found.
      word - the word.
    • separator

      public void separator(int index, int codePoint)
      Description copied from interface: TokenHandler2
      A separator (Zs, Zl and Zp unicode categories) was found.
      Specified by:
      separator in interface TokenHandler2
      Specified by:
      separator in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the separator was found.
      codePoint - the codepoint of the found separator.
    • quoted

      public void quoted(int index, CharSequence quoted, int quoteCp)
      Description copied from interface: TokenHandler2
      A quoted string was found by the parser.
      Specified by:
      quoted in interface TokenHandler2
      Specified by:
      quoted in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the quoted string was found.
      quoted - the quoted sequence of characters, without the quotes.
      quoteCp - the quote character.
    • quotedWithControl

      public void quotedWithControl(int index, CharSequence quoted, int quoteCp)
      Description copied from interface: TokenHandler2
      A quoted string was found by the parser, and contains control characters.
      Specified by:
      quotedWithControl in interface TokenHandler2
      Specified by:
      quotedWithControl in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the quoted string was found.
      quoted - the quoted sequence of characters, without the quotes.
      quoteCp - the quote character codepoint.
    • quotedNewlineChar

      public void quotedNewlineChar(int index, int codePoint)
      Description copied from interface: TokenHandler2
      An unescaped FF/LF/CR control was found while assembling a quoted string.
      Specified by:
      quotedNewlineChar in interface TokenHandler2
      Specified by:
      quotedNewlineChar in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the control was found.
      codePoint - the FF/LF/CR codepoint.
    • leftParenthesis

      public void leftParenthesis(int index)
      Description copied from interface: TokenHandler2
      Called when the ( codepoint is found.
      Specified by:
      leftParenthesis in interface TokenHandler2
      Specified by:
      leftParenthesis in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • leftSquareBracket

      public void leftSquareBracket(int index)
      Description copied from interface: TokenHandler2
      Called when the [ codepoint is found.
      Specified by:
      leftSquareBracket in interface TokenHandler2
      Specified by:
      leftSquareBracket in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • leftCurlyBracket

      public void leftCurlyBracket(int index)
      Description copied from interface: TokenHandler2
      Called when the { codepoint is found.
      Specified by:
      leftCurlyBracket in interface TokenHandler2
      Specified by:
      leftCurlyBracket in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • rightParenthesis

      public void rightParenthesis(int index)
      Description copied from interface: TokenHandler2
      Called when the ) codepoint is found.
      Specified by:
      rightParenthesis in interface TokenHandler2
      Specified by:
      rightParenthesis in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • rightSquareBracket

      public void rightSquareBracket(int index)
      Description copied from interface: TokenHandler2
      Called when the ] codepoint is found.
      Specified by:
      rightSquareBracket in interface TokenHandler2
      Specified by:
      rightSquareBracket in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • rightCurlyBracket

      public void rightCurlyBracket(int index)
      Description copied from interface: TokenHandler2
      Called when the } codepoint is found.
      Specified by:
      rightCurlyBracket in interface TokenHandler2
      Specified by:
      rightCurlyBracket in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • character

      public void character(int index, int codePoint)
      Description copied from interface: TokenHandler2
      Other characters including punctuation (excluding connector punctuation) and symbols (Sc, Sm and Sk unicode categories) was found, that was not one of the non-alphanumeric characters allowed in words.

      Symbols in So category are considered part of words and won't be handled by this method.

      Specified by:
      character in interface TokenHandler2
      Specified by:
      character in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the punctuation was found.
      codePoint - the codepoint of the found punctuation.
    • escaped

      public void escaped(int index, int codePoint)
      Description copied from interface: TokenHandler2
      A codepoint preceded with a backslash was found outside of quoted text.
      Specified by:
      escaped in interface TokenHandler2
      Specified by:
      escaped in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the escaped codepoint was found.
      codePoint - the escaped codepoint.
    • control

      public void control(int index, int codePoint)
      Description copied from interface: TokenHandler2
      A control character codepoint was found.
      Specified by:
      control in interface TokenHandler2
      Specified by:
      control in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the control codepoint was found.
      codePoint - the control codepoint.
    • commented

      public void commented(int index, int commentType, String comment)
      Description copied from interface: TokenHandler2
      A commented string was found by the parser.
      Specified by:
      commented in interface TokenHandler2
      Specified by:
      commented in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the commented string was found.
      commentType - the type of comment.
      comment - the commented string.
    • endOfStream

      public void endOfStream(int len)
      Description copied from interface: TokenHandler2
      The stream that was being parsed reached its end.
      Specified by:
      endOfStream in interface TokenHandler2
      Specified by:
      endOfStream in interface TokenHandler3<RuntimeException>
      Parameters:
      len - the length of the processed stream.
    • error

      public void error(int index, byte errCode, CharSequence context)
      Description copied from interface: TokenHandler2
      An error was found while parsing.

      Something was found that broke the assumptions made by the parser, like an escape character at the end of the stream or an unmatched quote.

      Specified by:
      error in interface TokenHandler2
      Specified by:
      error in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the error was found.
      errCode - the error code.
      context - a context sequence. If a string was parsed, it will contain up to 16 characters before and after the error.