Interface TokenHandler2

All Superinterfaces:
TokenHandler3<RuntimeException>
All Known Subinterfaces:
TokenHandler
All Known Implementing Classes:
CommentRemovalHandler

public interface TokenHandler2 extends TokenHandler3<RuntimeException>
A TokenHandler3 that has no checked exceptions, backwards-compatible with TokenProducer 2.x.

Most token handlers will report problems through error handlers and produce no checked exceptions, in which case you should use this handler together with TokenProducer. In other use cases your handler may want to throw checked exceptions, and then you must use TokenProducer3 together with TokenHandler3 instead.

  • Method Summary

    Modifier and Type
    Method
    Description
    void
    character(int index, int codePoint)
    Other characters including punctuation (excluding connector punctuation) and symbols (Sc, Sm and Sk unicode categories) was found, that was not one of the non-alphanumeric characters allowed in words.
    void
    commented(int index, int commentType, String comment)
    A commented string was found by the parser.
    void
    control(int index, int codePoint)
    A control character codepoint was found.
    void
    endOfStream(int len)
    The stream that was being parsed reached its end.
    default void
    endPunctuation(int index, int codePoint)
    Called when end punctuation (Pe) codepoints are found (except characters handled by rightCurlyBracket(int), rightParenthesis(int) and rightSquareBracket(int)).
    void
    error(int index, byte errCode, CharSequence context)
    An error was found while parsing.
    void
    escaped(int index, int codePoint)
    A codepoint preceded with a backslash was found outside of quoted text.
    void
    leftCurlyBracket(int index)
    Called when the { codepoint is found.
    void
    leftParenthesis(int index)
    Called when the ( codepoint is found.
    void
    leftSquareBracket(int index)
    Called when the [ codepoint is found.
    void
    quoted(int index, CharSequence quoted, int quote)
    A quoted string was found by the parser.
    void
    quotedNewlineChar(int index, int codePoint)
    An unescaped FF/LF/CR control was found while assembling a quoted string.
    void
    quotedWithControl(int index, CharSequence quoted, int quoteCp)
    A quoted string was found by the parser, and contains control characters.
    void
    rightCurlyBracket(int index)
    Called when the } codepoint is found.
    void
    rightParenthesis(int index)
    Called when the ) codepoint is found.
    void
    rightSquareBracket(int index)
    Called when the ] codepoint is found.
    void
    separator(int index, int codePoint)
    A separator (Zs, Zl and Zp unicode categories) was found.
    default void
    startPunctuation(int index, int codePoint)
    Called when start punctuation (Ps) codepoints are found (except characters handled by leftCurlyBracket(int), leftParenthesis(int) and leftSquareBracket(int)).
    void
    At the beginning of parsing, this method is called, passing the TokenControl object that can be used to fine-control the parsing.
    void
    word(int index, CharSequence word)
    A word was found by the parser (includes connector punctuation).
  • Method Details

    • tokenStart

      void tokenStart(TokenControl control)
      At the beginning of parsing, this method is called, passing the TokenControl object that can be used to fine-control the parsing.
      Specified by:
      tokenStart in interface TokenHandler3<RuntimeException>
      Parameters:
      control - the TokenControl object in charge of parsing.
    • word

      void word(int index, CharSequence word)
      A word was found by the parser (includes connector punctuation).
      Specified by:
      word in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the word was found.
      word - the word.
    • separator

      void separator(int index, int codePoint)
      A separator (Zs, Zl and Zp unicode categories) was found.
      Specified by:
      separator in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the separator was found.
      codePoint - the codepoint of the found separator.
    • quoted

      void quoted(int index, CharSequence quoted, int quote)
      A quoted string was found by the parser.
      Specified by:
      quoted in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the quoted string was found.
      quoted - the quoted sequence of characters, without the quotes.
      quote - the quote character.
    • quotedWithControl

      void quotedWithControl(int index, CharSequence quoted, int quoteCp)
      A quoted string was found by the parser, and contains control characters.
      Specified by:
      quotedWithControl in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the quoted string was found.
      quoted - the quoted sequence of characters, without the quotes.
      quoteCp - the quote character codepoint.
    • quotedNewlineChar

      void quotedNewlineChar(int index, int codePoint)
      An unescaped FF/LF/CR control was found while assembling a quoted string.
      Specified by:
      quotedNewlineChar in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the control was found.
      codePoint - the FF/LF/CR codepoint.
    • leftParenthesis

      void leftParenthesis(int index)
      Called when the ( codepoint is found.
      Specified by:
      leftParenthesis in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • leftSquareBracket

      void leftSquareBracket(int index)
      Called when the [ codepoint is found.
      Specified by:
      leftSquareBracket in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • leftCurlyBracket

      void leftCurlyBracket(int index)
      Called when the { codepoint is found.
      Specified by:
      leftCurlyBracket in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • rightParenthesis

      void rightParenthesis(int index)
      Called when the ) codepoint is found.
      Specified by:
      rightParenthesis in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • rightSquareBracket

      void rightSquareBracket(int index)
      Called when the ] codepoint is found.
      Specified by:
      rightSquareBracket in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • rightCurlyBracket

      void rightCurlyBracket(int index)
      Called when the } codepoint is found.
      Specified by:
      rightCurlyBracket in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
    • startPunctuation

      default void startPunctuation(int index, int codePoint)
      Called when start punctuation (Ps) codepoints are found (except characters handled by leftCurlyBracket(int), leftParenthesis(int) and leftSquareBracket(int)).
      Specified by:
      startPunctuation in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
      codePoint - the found codepoint.
    • endPunctuation

      default void endPunctuation(int index, int codePoint)
      Called when end punctuation (Pe) codepoints are found (except characters handled by rightCurlyBracket(int), rightParenthesis(int) and rightSquareBracket(int)).
      Specified by:
      endPunctuation in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the codepoint was found.
      codePoint - the found codepoint.
    • character

      void character(int index, int codePoint)
      Other characters including punctuation (excluding connector punctuation) and symbols (Sc, Sm and Sk unicode categories) was found, that was not one of the non-alphanumeric characters allowed in words.

      Symbols in So category are considered part of words and won't be handled by this method.

      Specified by:
      character in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the punctuation was found.
      codePoint - the codepoint of the found punctuation.
    • escaped

      void escaped(int index, int codePoint)
      A codepoint preceded with a backslash was found outside of quoted text.
      Specified by:
      escaped in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the escaped codepoint was found.
      codePoint - the escaped codepoint.
    • control

      void control(int index, int codePoint)
      A control character codepoint was found.
      Specified by:
      control in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the control codepoint was found.
      codePoint - the control codepoint.
    • commented

      void commented(int index, int commentType, String comment)
      A commented string was found by the parser.
      Specified by:
      commented in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the commented string was found.
      commentType - the type of comment.
      comment - the commented string.
    • endOfStream

      void endOfStream(int len)
      The stream that was being parsed reached its end.
      Specified by:
      endOfStream in interface TokenHandler3<RuntimeException>
      Parameters:
      len - the length of the processed stream.
    • error

      void error(int index, byte errCode, CharSequence context)
      An error was found while parsing.

      Something was found that broke the assumptions made by the parser, like an escape character at the end of the stream or an unmatched quote.

      Specified by:
      error in interface TokenHandler3<RuntimeException>
      Parameters:
      index - the index at which the error was found.
      errCode - the error code.
      context - a context sequence. If a string was parsed, it will contain up to 16 characters before and after the error.