Class BBCodeLexer

This is the BBCodeLexer class, it is responsible for taking an input and breaking it into an array of "tokens", which we will walk through to generate the output in the BBCode Class

Hierarchy

  • BBCodeLexer

Constructors

  • Instantiate a new instance of the BBCodeLexer class.

    Parameters

    • string: string

      The string to be broken up into tokens.

    • tagMarker: string = '['

      The BBCode tag marker.

    • debug: boolean = false
    • allowEscape: boolean = true

    Returns BBCodeLexer

Properties

allowEscape: boolean

Whether to allow Escape Characters or not

debug: boolean

In debug mode, we dump decoded tags when we find them.

endTagMarker: string

The ending tag marker: "]", ">", "(", or "{"

escapeRegex: string

The actual Regex used to determine if something should be escaped Examples:

  • (?<!\\\\) Negative lookbehind to make sure there isn't a \ behind the tag (Current)
  • (?:(?<!\\\\)|(?<=\\\\\\\\)) Noncapturing group containing both a negative lookbehind and a positive lookbehind, to make sure that there isn't a \ behind the tag, or if there is, make sure that it is 2 , if there is 2 , it will still parse the tag.
  • Nothing, this will disable escape characters
genEscapeRegex: RegExp

Regex generated from patMain to remove escape characters from escaped tags

input: string[]

The input string, split into an array of tokens.

patComment: RegExp

Pattern for matching comments.

patComment2: RegExp

Pattern for matching comments.

patMain: string | RegExp

Main tag-matching pattern.

patWiki: RegExp

Pattern for matching wiki-links.

ptr: number

Read pointer into the input array.

state: LexState

Next state of the lexer's state machine: text, or tag/ws/nl

tag: boolean | TagType

If token is a tag, this is the decoded array version.

tagMarker: string

Which kind of tag marker we're using: "[", "<", "(", or "{"

text: string

Actual exact, original text of token.

token: BBToken

Return token type: One of the BBToken enum values

unget: boolean

Whether to "unget" the last token.

verbatim: boolean

In verbatim mode, we return all input, unparsed, including comments.

Methods

  • Given a tokenized piece of a tag, decide what type of token it is.

    Our return values are:

    • -1 End-of-input (EOI).
    • '=' Token is an = sign.
    • ' ' Token is whitespace.
    • '"' Token is quoted text.
    • 'A' Token is unquoted text.

    Parameters

    • ptr: number

      The index of pieces to examine.

    • pieces: string[]

      The pieces array to classify.

    Returns -1 | "=" | " " | "\"" | "A"

    Returns the tokenized piece of the tag.

  • Given a string containing a complete [tag] (including its brackets), break it down into its components and return them as an array.

    Parameters

    • tag: string

      The tag to decode.

    Returns TagType

    Returns the object representation of the tag.

  • Compute how many non-tag characters there are in the input, give or take a few.

    This is optimized for speed, not accuracy, so it'll get some stuff like horizontal rules and weird whitespace characters wrong, but it's only supposed to provide a rough quick guess, not a hard fact.

    Returns number

    Returns the approximate text length.

  • Return the type of the next token, either BBCODE_TAG or BBCODE_TEXT or BBCODE_EOI.

    This stores the content of this token into this.text, the type of this token in this.token, and possibly an array into this.tag.

    If this is a BBCODE_TAG token, this.tag will be an array computed from the tag's contents, like this:

    [
    '_name': tag_name,
    '_end': true if this is an end tag (i.e., the name starts with a /)
    '_default': default value (for example, in [url=foo], this is "foo").
    ...
    ...all other key: value parameters given in the tag...
    ...
    ]

    Returns BBToken

  • Restore the state of this lexer from a saved previous state.

    Parameters

    • lexState: State

      The previous lexer state.

    Returns void

  • Save the state of this lexer so it can be restored later.

    The return value from this should be considered opaque. Because PHP uses copy-on-write references, the total cost of the returned state is relatively small, and the running time of this function (and RestoreState) is very fast.

    Returns State

  • Given a string, if it's surrounded by "quotes" or 'quotes', remove them.

    Parameters

    • string: string

      The string to strip.

    Returns string

    Returns the string stripped of quotes.

  • Un-gets the last token read so that a subsequent call to NextToken() will return it.

    Note that ungetToken() does not switch states when you switch between verbatim mode and standard mode: For example, if you read a tag, unget the tag, switch to verbatim mode, and then get the next token, you'll get back a BBCODE_TAG --- exactly what you ungot, not a BBCODE_TEXT token.

    Returns void

Generated using TypeDoc