TikZ and PGF Manual
Utilities
92 Parser Module¶
-
\usepgfmodule{parser} % LaTeX and plain TeX and pure pgf ¶
-
\usepgfmodule[parser] % ConTeXt and pure pgf
This module defines some commands for creating a simple letter-by-letter parser.
-
\usepackage{pgfparser} % LaTeX ¶
-
\input pgfparser.tex % plain TeX
-
\usemodule[pgfparser] % ConTeXt
Because the parser module is almost independent of the rest of pgf, it can also be used as a standalone package with minimal dependencies.
This module provides commands for defining a parser that scans some given text letter-by-letter. For each letter, some code is executed and, possibly a state-switch occurs. The code for each letter might take mandatory or optional arguments. The parsing process ends when a final state has been reached, and optionally some code is executed afterwards. Each newly defined parser by default ignores space tokens, if you want to change that you’ll have to explicitly define an action for blank spaces (with \pgfparserdef).
-
\pgfparserparse{⟨parser name⟩}⟨text⟩ ¶
This command is used to parse the ⟨text⟩ using the (previously defined) parser named ⟨parser name⟩.
The ⟨text⟩ is not contained in curly braces, rather it is all the text that follows. The end of the text is determined implicitly, namely when the final state of the parser has been reached. If you defined a final action for the parser using \pgfparserdeffinal it is executed now.
The parser works as follows: At any moment, it is in a certain state, initially this state is called initial. Then, the first letter of the ⟨text⟩ is examined (using the \futurelet command). For each possible state and each possible letter, some action code is stored in the parser in a table. This code is then executed. This code may, but need not, trigger a state switch, causing a new state to be set. The parser then moves on to the next character of the text and repeats the whole procedure, unless it is in the state final, which causes the parsing process to stop immediately.
In the following example, the parser counts the number of a’s in the text, ignoring any b’s. The ⟨text⟩ ends with the first c.
-
\pgfparserdef{⟨parser name⟩}{⟨state⟩}⟨symbol meaning⟩[⟨arguments⟩]{⟨action⟩} ¶
-
• m a normal mandatory argument
-
• r⟨delim⟩ a mandatory argument which is read up to the ⟨delim⟩
-
• o an optional argument in [] defaulting to a special mark
-
• O{⟨default⟩} like o but defaulting to ⟨default⟩
-
• d⟨delim1⟩⟨delim2⟩ an optional argument in ⟨delim1⟩ and ⟨delim2⟩ defaulting to a special mark
-
• D⟨delim1⟩⟨delim2⟩{⟨default⟩} like d but defaulting to ⟨default⟩
-
• t⟨token⟩ tests whether the next letter is ⟨token⟩, if so gobbles it and the argument is set to a special mark.
This command should be used repeatedly to define a parser named ⟨parser name⟩. With a call to this command you specify that the ⟨parser name⟩ should do the following: When it is in state ⟨state⟩ and reads the letter ⟨symbol meaning⟩, perform the code stored in ⟨action⟩.
The ⟨symbol meaning⟩ must be the text that results from applying the TeX command \meaning to the given character. For instance, \meaning a yields the letter a, while \meaning 1 yields the character 1. A space yields blank space . Alternatively you can give the symbol you want without surrounding it in braces. So both \pgfparserdef{myparser}{initial}{the letter a}{foo} and \pgfparserdef{myparser}{initial}a{foo} define an ⟨action⟩ for the letter a. This short form works for most tokens, but not for a space (in which case you can use \pgfparserdef{myparser}{initial}{blank space}{foo}), and opening braces (in which case you can use \pgfparserdef{myparser}{initial}{\meaning\bgroup}{foo}, and one might prefer to use \pgfparserdef{myparser}{initial}{\meaning\egroup}{foo} for closing braces as well). You can as well define an action for a macro’s meaning (note that macros with different names can have the same meaning), so things like \pgfparserdef{myparser}{initial}\texttt{foo} are possible as well.
The ⟨action⟩ might require arguments which you can specify in the optional ⟨arguments⟩ string. The argument string can contain up to nine argument specifications of the following types:
So if you want to define an ⟨action⟩ that takes two mandatory arguments you use [mm], if it should take an optional star, one optional argument in brackets that returns a marker if it’s not used, one mandatory and finally an optional argument in parentheses that defaults to something you use [t*omD(){something}] as the argument string. If the argument should be anything up to a semicolon, you use [r;]. Spaces before argument specifications in the string are ignored. So [r m] will be one argument and read anything up to an m. Also spaces before any argument in the parsed letters are ignored, so if a was setup to take an optional argument the argument would be detected in a []. Like with normal LaTeX2ε optional arguments you have to protect nested brackets: [a[bc]d] would be read as a[bc with a trailing d], not as a[bc]d. You’d have to use [{a[bc]d}] to get it correct.
Inside the ⟨action⟩ you can perform almost any kind of code. This code will not be surrounded by a scope, so its effect persists after the parsing is done. However, each time after the ⟨action⟩ is executed, control goes back to the parser. You should not launch a parser inside the ⟨action⟩ code, unless you put it in a scope.
When you use all as the ⟨state⟩, the ⟨action⟩ is performed in all states as a fallback, whenever ⟨symbol meaning⟩ is encountered. This means that when you do not specify anything explicitly for a state and a letter, but you do specify something for all and this letter, then the specified ⟨action⟩ will be used.
When the parser encounters a letter for which nothing is specified in the current state (neither directly nor indirectly via all), an error occurs. Additionally you can specify an action that is executed after the error is thrown using \pgfparserdefunknown. To suppress these errors (but not the action specified with \pgfparserdefunknown) you can use the /pgfparser/silent key or the silent key of the current ⟨parser name⟩.
-
\pgfparserlet{⟨parser name 1⟩}{⟨state 1⟩}⟨symbol meaning 1⟩[⟨opt 1⟩][⟨opt 2⟩]⟨symbol meaning 2⟩ ¶
If none of the optional arguments are given in the following explanation ⟨parser name 2⟩ and ⟨state 2⟩ are the same as ⟨parser name 1⟩ and ⟨state 1⟩. If only the first is given ⟨state 2⟩ equals ⟨opt 1⟩. If both are given ⟨parser name 2⟩ equals ⟨opt 1⟩ and ⟨state 2⟩ equals ⟨opt 2⟩.
Defines an action for ⟨parser name 1⟩ in ⟨state 1⟩ for the ⟨symbol meaning 1⟩ to do the same as the action of ⟨parser name 2⟩ in ⟨state 2⟩ for the ⟨symbol meaning 2⟩. For ⟨symbol meaning 1⟩ and ⟨symbol meaning 2⟩ the same parsing rules apply as for ⟨symbol meaning⟩ in \pgfparserdef so you either give the meaning in braces or just the symbol.
-
\pgfparserdefunknown{⟨parser name⟩}{⟨state⟩}{⟨action⟩} ¶
With this macro you can define an ⟨action⟩ for the ⟨parser name⟩ parser in ⟨state⟩ if no action was defined for the letter which was encountered.
-
\pgfparserdeffinal{⟨parser name⟩}{⟨action⟩} ¶
Every parser can call a final ⟨action⟩ after the state was switched to final. This ⟨action⟩ is executed after everything else, so you can use something that grabs more arguments if you want to.
-
\pgfparserswitch{⟨state⟩} ¶
This command can be called inside the action code of a parser to cause a state switch to ⟨state⟩.
-
\pgfparserifmark{⟨arg⟩}{⟨true⟩}{⟨false⟩} ¶
Remember that some of the optional argument types set special marks? With \pgfparserifmark you can test whether ⟨arg⟩ is such a mark. So if there was no optional argument for the argument types o and d the ⟨true⟩ branch will be executed, else the ⟨false⟩ branch. For the t type argument the ⟨true⟩ branch is executed if the token was encountered.
-
\pgfparserreinsert ¶
You can use this as the final macro in an action of \pgfparserdef or \pgfparserdefunknown. This has the effect that the contents of \pgfparserletter will be parsed next. Without any redefinition the result will be that the last token will be parsed again. You can change the definition of \pgfparserletter just before \pgfparserreinsert as well to parse some specific tokens next.
-
\pgfparserstate ¶
Expands to the current state of the parser.
-
\pgfparsertoken ¶
This is the macro which is let to the following token with \futurelet. You can use it inside an action code.
-
\pgfparserletter ¶
This macro stores the letter to which \pgfparsertoken was let. So if you’d use \pgfparserparse{foo}a this macro would be defined with \def\pgfparserletter{a}. This definition is done before any action code is executed. There are four special cases: If the next token is of category code 1, 2, 6, or 10, so with standard category codes the tokens {, }, #, and ␣ (a space), it would be treated differently. In those cases this macro expands to \bgroup, \egroup, ##, and ␣ for the categories 1, 2, 6, and 10, respectively.
92.1 Keys of the Parser Module¶
-
/pgfparser/silent=⟨boolean⟩ (no default, initially false) ¶
If true then no error will be thrown when a letter is parsed for which no action is specified, silently ignoring it. This holds true for every parser.
-
/pgfparser/status=⟨boolean⟩ (no default, initially false) ¶
If true every parser prints a status message for every action executed. This might help in debugging and understanding what the parser does.
Additionally to those keys for every ⟨parser name⟩ for which \pgfparserdef, \pgfparserdefunknown or \pgfparserlet was run at least once the following will be defined:
-
/pgfparser/⟨parser name⟩/silent=⟨boolean⟩ (no default, initially false) ¶
If true the parser ⟨parser name⟩ will silently ignore undefined letters. This is an individual equivalent of /pgfparser/silent for each defined parser.
92.2 Examples¶
The following example counts the different letters appearing in a more or less random string of letters. Every letter is counted only once, this is achieved by defining a new action for every encountered unknown letter that does nothing. We can define such rule without knowing which letter is used, because \pgfparsertoken has the same meaning as that letter.
Next we want to try something that uses some of the different argument types available.