PGF/TikZ Manual

TikZ and PGF Manual

Utilities

92 Parser Module

  • \usepgfmodule{parser} % and plain and pure pgf

  • \usepgfmodule[parser] % Cont and pure pgf

  • This module defines some commands for creating a simple letter-by-letter parser.

  • \usepackage{pgfparser} %

  • \input pgfparser.tex % plain

  • \usemodule[pgfparser] % Cont

  • Because the parser module is almost independent of the rest of pgf, it can also be used as a standalone package with minimal dependencies.

This module provides commands for defining a parser that scans some given text letter-by-letter. For each letter, some code is executed and, possibly a state-switch occurs. The code for each letter might take mandatory or optional arguments. The parsing process ends when a final state has been reached, and optionally some code is executed afterwards. Each newly defined parser by default ignores space tokens, if you want to change that you’ll have to explicitly define an action for blank spaces (with \pgfparserdef).

  • \pgfparserparse{parser name}text

  • This command is used to parse the text using the (previously defined) parser named parser name.

    The text is not contained in curly braces, rather it is all the text that follows. The end of the text is determined implicitly, namely when the final state of the parser has been reached. If you defined a final action for the parser using \pgfparserdeffinal it is executed now.

    The parser works as follows: At any moment, it is in a certain state, initially this state is called initial. Then, the first letter of the text is examined (using the \futurelet command). For each possible state and each possible letter, some action code is stored in the parser in a table. This code is then executed. This code may, but need not, trigger a state switch, causing a new state to be set. The parser then moves on to the next character of the text and repeats the whole procedure, unless it is in the state final, which causes the parsing process to stop immediately.

    In the following example, the parser counts the number of a’s in the text, ignoring any b’s. The text ends with the first c.

    cccThere are 9 a’s.

    \usepgfmodule {parser}
    \newcount\mycount
    \pgfparserdef{myparser}{initial}{the letter a}%
    {\advance\mycount by 1\relax}%
    \pgfparserdef{myparser}{initial}{the letter b}%
    {} % do nothing
    \pgfparserdef{myparser}{initial}{the letter c}%
    {\pgfparserswitch{final}}% done!

    \pgfparserparse{myparser}aabaabababbbbbabaabcccc
    There are \the\mycount\ a's.

  • \pgfparserdef{parser name}{state}symbol meaning[arguments]{action}

  • This command should be used repeatedly to define a parser named parser name. With a call to this command you specify that the parser name should do the following: When it is in state state and reads the letter symbol meaning, perform the code stored in action.

    The symbol meaning must be the text that results from applying the command \meaning to the given character. For instance, \meaning a yields the letter a, while \meaning 1 yields the character 1. A space yields blank space . Alternatively you can give the symbol you want without surrounding it in braces. So both \pgfparserdef{myparser}{initial}{the letter a}{foo} and \pgfparserdef{myparser}{initial}a{foo} define an action for the letter a. This short form works for most tokens, but not for a space (in which case you can use \pgfparserdef{myparser}{initial}{blank space}{foo}), and opening braces (in which case you can use \pgfparserdef{myparser}{initial}{\meaning\bgroup}{foo}, and one might prefer to use \pgfparserdef{myparser}{initial}{\meaning\egroup}{foo} for closing braces as well). You can as well define an action for a macro’s meaning (note that macros with different names can have the same meaning), so things like \pgfparserdef{myparser}{initial}\texttt{foo} are possible as well.

    The action might require arguments which you can specify in the optional arguments string. The argument string can contain up to nine argument specifications of the following types:

    • m  a normal mandatory argument

    • rdelim  a mandatory argument which is read up to the delim

    • o  an optional argument in [] defaulting to a special mark

    • O{default}  like o but defaulting to default

    • ddelim1delim2  an optional argument in delim1 and delim2 defaulting to a special mark

    • Ddelim1delim2{default}  like d but defaulting to default

    • ttoken  tests whether the next letter is token, if so gobbles it and the argument is set to a special mark.

    So if you want to define an action that takes two mandatory arguments you use [mm], if it should take an optional star, one optional argument in brackets that returns a marker if it’s not used, one mandatory and finally an optional argument in parentheses that defaults to something you use [t*omD(){something}] as the argument string. If the argument should be anything up to a semicolon, you use [r;]. Spaces before argument specifications in the string are ignored. So [r m] will be one argument and read anything up to an m. Also spaces before any argument in the parsed letters are ignored, so if a was setup to take an optional argument the argument would be detected in a []. Like with normal 2ε optional arguments you have to protect nested brackets: [a[bc]d] would be read as a[bc with a trailing d], not as a[bc]d. You’d have to use [{a[bc]d}] to get it correct.

    Inside the action you can perform almost any kind of code. This code will not be surrounded by a scope, so its effect persists after the parsing is done. However, each time after the action is executed, control goes back to the parser. You should not launch a parser inside the action code, unless you put it in a scope.

    When you use all as the state, the action is performed in all states as a fallback, whenever symbol meaning is encountered. This means that when you do not specify anything explicitly for a state and a letter, but you do specify something for all and this letter, then the specified action will be used.

    When the parser encounters a letter for which nothing is specified in the current state (neither directly nor indirectly via all), an error occurs. Additionally you can specify an action that is executed after the error is thrown using \pgfparserdefunknown. To suppress these errors (but not the action specified with \pgfparserdefunknown) you can use the /pgfparser/silent key or the silent key of the current parser name.

  • \pgfparserlet{parser name 1}{state 1}symbol meaning 1[opt 1][opt 2]symbol meaning 2

  • If none of the optional arguments are given in the following explanation parser name 2 and state 2 are the same as parser name 1 and state 1. If only the first is given state 2 equals opt 1. If both are given parser name 2 equals opt 1 and state 2 equals opt 2.

    Defines an action for parser name 1 in state 1 for the symbol meaning 1 to do the same as the action of parser name 2 in state 2 for the symbol meaning 2. For symbol meaning 1 and symbol meaning 2 the same parsing rules apply as for symbol meaning in \pgfparserdef so you either give the meaning in braces or just the symbol.

  • \pgfparserdefunknown{parser name}{state}{action}

  • With this macro you can define an action for the parser name parser in state if no action was defined for the letter which was encountered.

  • \pgfparserdeffinal{parser name}{action}

  • Every parser can call a final action after the state was switched to final. This action is executed after everything else, so you can use something that grabs more arguments if you want to.

  • \pgfparserswitch{state}

  • This command can be called inside the action code of a parser to cause a state switch to state.

  • \pgfparserifmark{arg}{true}{false}

  • Remember that some of the optional argument types set special marks? With \pgfparserifmark you can test whether arg is such a mark. So if there was no optional argument for the argument types o and d the true branch will be executed, else the false branch. For the t type argument the true branch is executed if the token was encountered.

  • \pgfparserreinsert

  • You can use this as the final macro in an action of \pgfparserdef or \pgfparserdefunknown. This has the effect that the contents of \pgfparserletter will be parsed next. Without any redefinition the result will be that the last token will be parsed again. You can change the definition of \pgfparserletter just before \pgfparserreinsert as well to parse some specific tokens next.

  • \pgfparserstate

  • Expands to the current state of the parser.

  • \pgfparsertoken

  • This is the macro which is let to the following token with \futurelet. You can use it inside an action code.

  • \pgfparserletter

  • This macro stores the letter to which \pgfparsertoken was let. So if you’d use \pgfparserparse{foo}a this macro would be defined with \def\pgfparserletter{a}. This definition is done before any action code is executed. There are four special cases: If the next token is of category code 1, 2, 6, or 10, so with standard category codes the tokens {, }, #, and ␣ (a space), it would be treated differently. In those cases this macro expands to \bgroup, \egroup, ##, and ␣ for the categories 1, 2, 6, and 10, respectively.

  • \pgfparserset{key list}

  • The pgfparser module has a few keys you can access through this macro. It is just a shortcut for \pgfset{/pgfparser/.cd,#1}. The available keys are listed in subsection 92.1.

92.1 Keys of the Parser Module
  • /pgfparser/silent=boolean (no default, initially false)

  • If true then no error will be thrown when a letter is parsed for which no action is specified, silently ignoring it. This holds true for every parser.

  • /pgfparser/status=boolean (no default, initially false)

  • If true every parser prints a status message for every action executed. This might help in debugging and understanding what the parser does.

Additionally to those keys for every parser name for which \pgfparserdef, \pgfparserdefunknown or \pgfparserlet was run at least once the following will be defined:

  • /pgfparser/parser name/silent=boolean (no default, initially false)

  • If true the parser parser name will silently ignore undefined letters. This is an individual equivalent of /pgfparser/silent for each defined parser.

92.2 Examples

The following example counts the different letters appearing in a more or less random string of letters. Every letter is counted only once, this is achieved by defining a new action for every encountered unknown letter that does nothing. We can define such rule without knowing which letter is used, because \pgfparsertoken has the same meaning as that letter.

13 different letters found

\usepgfmodule {parser}
\mycount=0
% using the shortcut syntax of just placing ; after the state
\pgfparserdef{different letters}{all};{\pgfparserswitch{final}}%
\pgfparserdefunknown{different letters}{all}%
{\pgfparserdef{different letters}{all}\pgfparsertoken{}\advance\mycount1}%
\pgfparserdeffinal{different letters}%
{\the\mycount\ different letters found}%
% don't throw errors for unknown letters
\pgfparserset{different letters/silent=true}%

\pgfparserparse{different letters}udiaternxqlchudiea;

Next we want to try something that uses some of the different argument types available.

nobody will use Parser

\usepgfmodule {parser}
% using the long syntax of \pgfparserdef
\pgfparserdef{arguments}{initial}{the letter a}[d()]
{\pgfparserifmark{#1}{\textcolor{red}{\textit{use}}}{\textbf{#1}} }%
% using the shortcut syntax
\pgfparserdef{arguments}{initial}t[m]{\texttt{#1} }%
\pgfparserdef{arguments}{initial}c[t*O{blue}m]
{\pgfparserifmark{#1}{#3}{\textcolor{#2}{#3}}}%
\pgfparserdef{arguments}{all};{\pgfparserswitch{final}}%

\pgfparserparse{arguments}t{nobody}a(will)ac[green]{P}c*{arse}c{r};