Class TokenProducer3<E extends Exception>

java.lang.Object
io.sf.carte.uparser.TokenProducer3<E>
Direct Known Subclasses:
TokenProducer

public class TokenProducer3<E extends Exception> extends Object
A simple parser that produces tokens from a String or Reader, and processes them through a user-provided handler that may throw checked exceptions of type E.

The tokens produced are:

  • Words. Contains letters, digits, format characters (Unicode Cf), connector punctuations, certain symbols and any codepoint in allowInWords or accepted by the supplied CharacterCheck object.
  • Quoted text (within single and double quotes).
  • Grouping characters: {}[]().
  • Separators.
  • Escaped characters: anything after a backslash, unless it is within a quoted text.
  • Control characters.
  • Other characters.
  • Comments. (if a comment-supporting method is used)

A moderate level of control of the parsing can be achieved with a TokenControl object.

The constructors that do not take a characterCountLimit argument process Reader streams of up to one Gigabyte (0x40000000) in size, throwing a SecurityException if they exceed that limit. Use the other constructors if you need to process larger files (or want smaller limits being enforced).

  • Field Details

  • Constructor Details

    • TokenProducer3

      public TokenProducer3(TokenHandler3<E> handler)
      Instantiate a TokenProducer object with the given handler.
      Parameters:
      handler - the token handler.
    • TokenProducer3

      public TokenProducer3(TokenHandler3<E> handler, int characterCountLimit)
      Instantiate a TokenProducer object with the given handler and processing limit.
      Parameters:
      handler - the token handler.
      characterCountLimit - the character count limit.
    • TokenProducer3

      public TokenProducer3(TokenHandler3<E> handler, TokenProducer3.CharacterCheck charCheck)
      Instantiate a TokenProducer object with the given handler and CharacterCheck.
      Parameters:
      handler - the token handler.
      charCheck - the character checker object.
    • TokenProducer3

      public TokenProducer3(TokenHandler3<E> handler, TokenProducer3.CharacterCheck charCheck, int characterCountLimit)
      Instantiate a TokenProducer object with the given handler and CharacterCheck.
      Parameters:
      handler - the token handler.
      charCheck - the character checker object.
      characterCountLimit - the character count limit.
    • TokenProducer3

      public TokenProducer3(TokenHandler3<E> handler, int[] allowInWords)
      Construct a TokenProducer object with the given handler and an array of codepoints allowed in words.
      Parameters:
      handler - the token handler.
      allowInWords - the array of codepoints allowed in words.
    • TokenProducer3

      public TokenProducer3(TokenHandler3<E> handler, int[] allowInWords, int characterCountLimit)
      Construct a TokenProducer object with the given handler and an array of codepoints allowed in words.
      Parameters:
      handler - the token handler.
      allowInWords - the array of codepoints allowed in words.
      characterCountLimit - the character count limit.
  • Method Details

    • setHandleAllSeparators

      public void setHandleAllSeparators(boolean handleAllSeparators)
      Set the handling of consecutive separators like whitespace or tabs.

      Default is true.

      Parameters:
      handleAllSeparators - if set to true, all separator characters (including consecutive ones) will trigger a TokenHandler3.separator(int, int) method call. Otherwise only single separations between the other types of tokens will be taken into account.
    • setAcceptNewlineEndingQuote

      public void setAcceptNewlineEndingQuote(boolean accept)
      If set, quoted strings ending with an unescaped newline (instead of the closing quote) are processed through the relevant quoted method, albeit an error is reported in any case. Otherwise, only the error is reported.

      It is set to false by default.

      Parameters:
      accept - true to process quoted strings that ends with an unescaped newline, false otherwise.
    • setAcceptEofEndingQuoted

      public void setAcceptEofEndingQuoted(boolean accept)
      If set, quoted strings ending with an EOF (End Of File). are processed through the relevant quoted method, albeit an error is reported in any case. Otherwise, only the error is reported.

      It is set to false by default.

      Parameters:
      accept - true to process quoted strings that ends with an EOF, false otherwise.
    • parse

      public void parse(String string) throws E
      Tokenize a string, without any comment handling.
      Parameters:
      string - the string to parse.
      Throws:
      E
      NullPointerException - if the string is null.
    • parse

      public void parse(String string, String commentOpen, String commentClose) throws E
      Tokenize a string, accounting for the given comment syntax.
      Parameters:
      string - the string to parse.
      commentOpen - the token that opens a comment, for example /*. It is not allowed to repeat the same character at the beginning of the token. For example, <!-- is a valid token but <<-- would not.
      commentClose - the token closing a comment, for example */
      Throws:
      E
      NullPointerException - if any of the string arguments are null.
    • parse

      public void parse(Reader reader) throws E, IOException
      Tokenize the contents of the given Reader without any comment handling.

      A buffer with an initial default capacity of 256 will be used.

      Parameters:
      reader - the Reader whose contents are to be parsed.
      Throws:
      IOException - if an I/O problem occurs with the Reader.
      E
    • parse

      public void parse(Reader reader, int bufferCapacity) throws E, IOException
      Tokenize the contents of a Reader without any comment handling, using a buffer with the given initial capacity.
      Parameters:
      reader - the Reader whose contents are to be parsed.
      bufferCapacity - the initial buffer capacity.
      Throws:
      IOException - if an I/O problem occurs with the Reader.
      E
    • parse

      public void parse(Reader reader, String commentOpen, String commentClose) throws E, IOException
      Tokenize the given reader, with a single comment layout.
      Parameters:
      reader - the reader to parse.
      commentOpen - the token that opens a comment. It is not allowed to repeat the same character at the beginning of the token. For example, <!-- is a valid token but <<-- would not.
      commentClose - the token that closes a comment.
      Throws:
      IOException - if an I/O problem occurs.
      NullPointerException - if any of the string arguments are null.
      E
    • parseMultiComment

      public void parseMultiComment(Reader reader, String[] opening, String[] closing) throws E, IOException
      Tokenize the given reader, with multiple comment layouts.
      Parameters:
      reader - the reader to parse.
      opening - the array of tokens that open a comment. It is not allowed to repeat the same character at the beginning of a token. For example, <!-- is a valid token but <<-- would not.
      closing - the array of tokens that close the comment opened with the opening at the same index.
      Throws:
      IOException - if an I/O problem was found parsing the reader.
      IllegalArgumentException - if the opening and closing arrays do not have the same length.
      E