Class TokenProducer

java.lang.Object
io.sf.carte.uparser.TokenProducer

public class TokenProducer extends Object
A simple parser that produces tokens from a String or Reader.

The tokens produced are:

  • Words. Contains letters, digits, format characters (Unicode Cf), connector punctuations, certain symbols and any codepoint in allowInWords or accepted by the supplied CharacterCheck object.
  • Quoted text (within single and double quotes).
  • Grouping characters: {}[]().
  • Separators.
  • Escaped characters: anything after a backslash, unless it is within a quoted text.
  • Control characters.
  • Other characters.
  • Comments. (if a comment-supporting method is used)

A moderate level of control of the parsing can be achieved with a TokenControl object.

  • Field Details

  • Constructor Details

    • TokenProducer

      public TokenProducer(TokenHandler handler)
      Instantiate a TokenProducer object with the given handler.
      Parameters:
      handler - the token handler.
    • TokenProducer

      public TokenProducer(TokenHandler handler, int characterCountLimit)
      Instantiate a TokenProducer object with the given handler and processing limit.
      Parameters:
      handler - the token handler.
      characterCountLimit - the character count limit.
    • TokenProducer

      public TokenProducer(TokenHandler handler, TokenProducer.CharacterCheck charCheck)
      Instantiate a TokenProducer object with the given handler and CharacterCheck.
      Parameters:
      handler - the token handler.
      charCheck - the character checker object.
    • TokenProducer

      public TokenProducer(TokenHandler handler, TokenProducer.CharacterCheck charCheck, int characterCountLimit)
      Instantiate a TokenProducer object with the given handler and CharacterCheck.
      Parameters:
      handler - the token handler.
      charCheck - the character checker object.
      characterCountLimit - the character count limit.
    • TokenProducer

      public TokenProducer(TokenHandler handler, int[] allowInWords)
    • TokenProducer

      public TokenProducer(TokenHandler handler, int[] allowInWords, int characterCountLimit)
      Construct a TokenProducer object with the given handler and an array of codepoints allowed in words.
      Parameters:
      handler - the token handler.
      allowInWords - the array of codepoints allowed in words.
      characterCountLimit - the character count limit.
  • Method Details

    • setHandleAllSeparators

      public void setHandleAllSeparators(boolean handleAllSeparators)
      Set the handling of consecutive separators like whitespace or tabs.

      Default is true.

      Parameters:
      handleAllSeparators - if set to true, all separator characters (including consecutive ones) will trigger a TokenHandler.separator(int, int) method call. Otherwise only single separations between the other types of tokens will be taken into account.
    • setAcceptNewlineEndingQuote

      public void setAcceptNewlineEndingQuote(boolean accept)
      If set, quoted strings ending with an unescaped newline (instead of the closing quote) are processed through the relevant quoted method, albeit an error is reported in any case. Otherwise, only the error is reported.

      It is set to false by default.

      Parameters:
      accept - true to process quoted strings that ends with an unescaped newline, false otherwise.
    • setAcceptEofEndingQuoted

      public void setAcceptEofEndingQuoted(boolean accept)
      If set, quoted strings ending with an EOF (End Of File). are processed through the relevant quoted method, albeit an error is reported in any case. Otherwise, only the error is reported.

      It is set to false by default.

      Parameters:
      accept - true to process quoted strings that ends with an EOF, false otherwise.
    • parse

      public void parse(String string)
    • parse

      public void parse(String string, String commentOpen, String commentClose)
    • parse

      public void parse(Reader reader) throws IOException
      Throws:
      IOException
    • parse

      public void parse(Reader reader, int bufferCapacity) throws IOException
      Throws:
      IOException
    • parse

      public void parse(Reader reader, String commentOpen, String commentClose) throws IOException
      Parse the given reader, with a single comment layout.
      Parameters:
      reader - the reader to parse.
      commentOpen - the token that opens a comment. It is not allowed to repeat the same character at the beginning of the token. For example, <!-- is a valid token but <<-- would not.
      commentClose - the token that closes a comment.
      Throws:
      IOException - if an I/O problem was found parsing the reader.
    • parseMultiComment

      public void parseMultiComment(Reader reader, String[] opening, String[] closing) throws IOException
      Parse the given reader, with multiple comment layouts.
      Parameters:
      reader - the reader to parse.
      opening - the array of tokens that open a comment. It is not allowed to repeat the same character at the beginning of a token. For example, <!-- is a valid token but <<-- would not.
      closing - the array of tokens that close the comment opened with the opening at the same index.
      Throws:
      IOException - if an I/O problem was found parsing the reader.