4. The State Machine

While developing this tool, one big issue was the WML parsing, since WML allow to add nested Lua code. The classical (perl) approach was to use two separate functions, one dedicated to lua code, and one for WML. The classical approach, however, can lead to some problems, when we face WML file with nested Lua code, so why another approach is used here.

This release has an unique “parser”, using a finite state machine that reads every line of a file (Lua or WML) and perform the proper action (running a concrete state) when an important thing was found (for example, a translatable string).

# ./wmlxgettext:146
pywmlx.statemachine.setup(sentlist, args.initdom, args.domain)
for fx in args.filelist:
    # omissing some code
    # ./wmlxgettext:157
    if fname[-4:].lower() == '.cfg':
        pywmlx.statemachine.run(filebuf=infile, fileref=fx,
                       fileno=fileno, startstate='wml_idle', waitwml=True)
    if fname[-4:].lower() == '.lua':
        pywmlx.statemachine.run(filebuf=infile, fileref=fx,
                       fileno=fileno, startstate='lua_idle', waitwml=False)

First of all, the state machine is initialized with the statemachine.setup() function (called one time only during all the script execution).

Then wmlxgettext will execute statemachine.run() function every times we open a new file (listed on args.filelist). This is the statemachine.run() parameters list:

  • filebuf: the file buffer to read
  • fileref: the name of file (relative path to –directory)
  • fileno: a progressive (and unique) id value assigned to the file
  • startstate: the name of the state where the state machine must start. Its value is assigned to ‘wml_idle’ for WML (.cfg) files or assigned to ‘lua_idle’ for .lua files
  • waitwml: Its value is True if we are parsing a WML file. It is False if we are parsing a Lua file. Infact, only if a Lua code is indented in a WML file you could “expect” to exit from lua parsing and returning to WML parsing. In a .lua file, instead, you will have only Lua code.

4.1. The State class

Now it is time to start to explain more deeply how the state machine works. The State class has 3 properties:

  • regex: it is the regular expression to match. If the regex matches, than the run function will be executed.
  • run (self, xline, lineno, match):
    • xline: the line of the file we are parsing
    • lineno: current line number
    • match: the match object returned by re.match(regex)
  • iffail: the state (state name) to reach if the regex does not match (usually the next state).

The State class prototype (./pywmlx/state/state.py) does not contain any actual code. The concrete states are defined in ./pywmlx/state/lua_states.py and in ./pywmlx/state/wml_states.py using temporary classes (for better code readability).

All states are stored in statemachine into a dictionary (_states) with:

  • key = State name (example: ‘wml_idle’)
  • value = concrete State object

4.1.1. Standard States

Standard states works exactly as previously explained:

digraph stdstate {
   node [shape="ellipse", style="filled", fillcolor="grey",
         fontname="DejaVu Sans Mono"]
   state [label="State", shape="octagon"]
   match [label="regex matches", shape="box", fillcolor="purple"]
   run [label="Execute run()"]
   notmatch [label="regex DOES NOT match", shape="box", fillcolor="purple"]
   fail [label="ChangeState: 'iffail'"]
   exe [label="Returns a tuple:\n(non_consumed_line, new_state)", 
        shape="box", fillcolor="cyan"]
   
   state -> match -> run -> exe
   state -> notmatch -> fail
}

The regexp is verified through re.match, so it maches only if the rule is True at the very start of the line. If it matches, than run() is executed.

Run() returns a pair of values (tuple):
  • xline: the non-consumed part of the line. If the line is to be considered consumed, then xline will be setted to None
  • nextstate: label of the next state to go. Usually it is ‘wml_idle’ or ‘lua_idle’ since the parsing is recursive.

If the regexp does not match, the iffail state will be reached. Usually the iffail is equal to the “next state”. See State Sequence

4.1.2. Always-Run States

Always-run states are special states with regexp = None

Unlike standard states, an always-run state will always execute its own run() function. An example of always-run state is ‘wml_idle’ state.

An always-run state does not actually require the iffail parameter. This is why always-run states have iffail = None

4.2. State Sequence

Now it is the time to show the generic state sequence:

digraph machine01 {
   node [shape="box", style="filled", fillcolor="grey",
         fontname="DejaVu Sans Mono"]
   idle [shape="ellipse", label="IDLE STATE", fillcolor="green"]
   idle2 [label="IDLE STATE of the other language", fillcolor="cyan"]
   preproc [label="Preprocessing and Comment STATES"]
   winfo [label="WML TAG/INFO STATES", fillcolor="yellow"]
   cstr [label="Capture String STATE (single-line?)", 
         shape="box", color="red", fillcolor="#ffaaaa"]
   mult [label="Multi-Line String STATE", color="red",
         fillcolor="#ffaaaa"]
   change [label="Change language STATE"]
   end [shape="ellipse", label="FINAL STATE", fillcolor="green"]
  
   idle -> preproc [color="darkgreen"]
   preproc -> winfo -> cstr -> change -> end
   preproc -> idle [color="blue"]
   winfo -> idle [color="blue"]
   cstr -> idle [color="blue"]
   cstr -> mult [style="dotted", color="blue"]
   mult -> mult
   mult -> idle
   change -> idle2 [color="blue"]
   end -> idle [color="darkgreen"]
}

This is, more or less, the design that is applied both for WML and Lua states. However the flow chart already displayed is mainly focused to WML states:

  • Arrows:
    • green -> Always-run states (IDLE and FINAL) always go into the state pointed by the green arrow
    • blue -> Standard state reach the State pointed by blue arrow when the regex found a match.
    • black -> Standard state reach the State pointed by black arrow when the regex DOESN’T match
  • Boxes/Ellipses:
    • IDLE and FINAL states are special states that appears both in WML and in Lua code. They are displayed in green circle since they are “always run” states.
    • Preprocessing States appears both in WML and Lua code, even if WML and Lua use different states (for example, #wmlxgettext is not needed in Lua code). They are standard states (grey box)
    • ‘wml_getinf’ and ‘wml_tag’ states appears only in WML states (yellow box)
    • String States (red boxes) behave very differently in WML and in Lua.
    • Change Language State checks if WML code switch to Lua or vice-versa. If the language is changed, the IDLE state of the other language will be reached (cyan box).

4.2.1. IDLE and FINAL States

Both IDLE and FINAL states check if there is a pending string, and if it is so, pending string will be stored in memory.

  • WML: checks pymlx.state.machine._pending_wmlstring. If pymlx.state.machine._pending_wmlstring is None there is no pending WML string to store
  • Lua: checks pymlx.state.machine._pending_luastring. If pymlx.state.machine._pending_luastring is None there is no pending Lua string to store

Both Lua and WML pending strings, before actually storing its own value, perform some cheks:

  • verify if it is a translatable string
  • verify if the current domain is the same of the addon domain name
  • if so, it fixes the string for storage, and then store it

However WML pending string is stored in a very different way then Lua pending string:

  • Lua pending string is directly stored, as a PoCommentedString, in the “posentence dictionary”.
  • WML pending string is, instead, stored in the current WML node as a WmlNodeSentence. Only when the current WML node will be closed, all WmlNodeSentence objects contained in the node will be stored in the “posentence dictionary”. (See: The nodemanip module and Converting WmlNodeSentence to PoCommentedString)

WmlFinalState always return the pair (xline, 'wml_idle') while LuaFinalState always return the pair (xline, 'lua_idle'), where xline is setted to None in both cases. As previously explained, infact, when xline is None, the line is considered completely consumed and the statemachine will read the next line of the file.

Finally, the ‘lua_final’ state perform another action, but it will be explain later. See About storing the last Lua function name.

4.2.2. Capture String States

When a string (translatable or not) is found, then the regexp of the proper “Capture String” state matches. Captured string will be stored as pymlx.state.machine._pending_wmlstring (WML string), or as pymlx.state.machine._pending_luastring (Lua string).

Now it is the time to discuss deeply about those capturing string states.

4.2.2.1. Capture String: WML States

WML accepts only one syntax:

_ "translatable_string"

Only two states, then, required to capture strings:

# ./pywmlx/state/wml_states:161
class WmlStr01:
    # ...
# ./pywmlx/state/wml_states:190
class WmlStr10:
digraph wmlstr {
   node [shape="record", style="filled", fillcolor="grey",
         fontname="DejaVu Sans Mono"]
   idle [shape="ellipse", label="WML IDLE STATE", fillcolor="green"]
   
   single [     
     label="{WmlStr01|{(single line string)|(first line of multiline string)}}"
   ]
   
   mult [label="{WmlStr10|(multiline string: from line 2 to last line)}", 
         color="red", fillcolor="#ffaaaa"]
   nextstate [label="Next State", shape="box", fillcolor="orange"]
  
   single -> nextstate
   single -> idle [color="blue"]
   single -> mult [style="dotted", color="blue"]
   mult -> mult
   mult -> idle
}

More in details:

  • WmlStr01 (‘wml_str01’): This state capture a single-line string and also capture the FIRST LINE of a WML multiline string.

    • If it is a single line string, the string will be stored in pymlx.state.machine._pending_wmlstring. (Change to ‘wml_idle’ state).
    • If the closing quote " does not exist (multiline string) , then the matched string will be saved in pymlx.state.machine._pending_wmlstring. Following lines will be added to the pending string by the WmlStr10 State (change to ‘wml_str10’ state)
  • WmlStr10 (‘wml_str10’): All following lines of the multiline string will be added to pending string by this state until the closing quote " will be finded. This states recursively come back to itself, and, when the string ends, state will be changed again to ‘wml_idle’

4.2.2.2. Capture String: Lua States

Unlike WML, Lua accepts three different syntaxes:

"string: type 1"

'string: type 2'

[[string: type 3]]

The third way (mostly suggested for multi-line lua strings) is even more flexible than showed in the sample code above, as you can type any number of equals symbols (from 0 to n) between the two brackets [[ and ]]

Note

In the example above, we wrote [[string: type3]], since it is the most common way of defining a bracketed lua string, but we could also put any number of equals symobols between brackets.

For example, we could have printed [==[string: type3]==] placing the equal symbol two times. In that case, both opening and closing delimiter must use the same amount of equal symbols.

Coming back to wmlxgettext, we shoud now notice that all this flexibility allowed by the lua language (three ways to identify a string) means “more states are required”. There are, infact, seven states this time:

# ./pywmlx/state/lua_states:71 (syntax "1": single-line or start multiline)
class LuaStr01:
    # ...
# ./pywmlx/state/lua_states:173 (syntax "1": multiline)
class LuaStr10:
    # ...
# ./pywmlx/state/lua_states:99 (syntax "2": single-line or start multiline)
class LuaStr02:
    # ...
# ./pywmlx/state/lua_states:193 (syntax "2": multiline)
class LuaStr20:
    # ...
# ./pywmlx/state/lua_states:127 (syntax "3": single-line ONLY)
class LuaStr03:
    # ...
# ./pywmlx/state/lua_states:149 (syntax "3": start multiline)
class LuaStr03o:
    # ...
# ./pywmlx/state/lua_states:211 (syntax "3": multiline [from line 2])
class LuaStr30:
    # ...
digraph luastr {
   node [shape="record", style="filled", fillcolor="grey",
         fontname="DejaVu Sans Mono"]
   idle [shape="ellipse", label="LUA IDLE STATE", fillcolor="green"]
   
   str01 [
     label="{LuaStr01|type 1|{(single line)|(first line of multiline string)}}"
   ]
   str10 [ color="red", fillcolor="#ffaaaa",
     label="{LuaStr10|type 1|(multiline string: from line 2 to last line)}"
   ]
   
   str02 [
     label="{LuaStr02|type 2|{(single line)|(first line of multiline string)}}"
   ]
   str20 [ color="red", fillcolor="#ffaaaa",
     label="{LuaStr20|type 2|(multiline string: from line 2 to last line)}"
   ]
   
   str03 [
     label="{LuaStr03|type 3|(single line ONLY)}"
   ]
   str03o [
     label="{LuaStr03o|type 3|(multiline string: line 1)}"
   ]
   str30 [ color="red", fillcolor="#ffaaaa",
     label="{LuaStr30|type 3|(multiline string: from line 2 to last line)}"
   ]
      
   nextstate [label="Next State", shape="box", fillcolor="orange"]
   
   str01 -> str02 -> str03 -> str03o -> nextstate
   str01 -> idle [color="blue"]
   str01 -> str10 [style="dotted", color="blue"]
   str02 -> idle [color="blue"]
   str02 -> str20 [style="dotted", color="blue"]
   str03 -> idle [color="blue"]
   str03o -> str30 [color="blue"]
   str10 -> str10
   str20 -> str20
   str30 -> str30
   str10 -> idle
   str20 -> idle
   str30 -> idle
}

This time the flow chart is not so easy to understand at a first sight, so it requires a little explaination:

  • Boxes/Ellipses:
    • green -> always-run states (green arrow rule applied)
    • orange -> used for “Next State”, for a better look
    • red (LuaStr10 and LuaStr20): LuaStr10 and LuaStr20 are recursive standard states. They can go back to theirself, until the end of the multi-line string is matched (when the multi-line string ends, ‘lua_idle’ state will be reached) (no arrow rule: all arrows are black)
    • red (LuaStr30): LuaStr30 is indeed an always-run state, but it acts like a recursive standard state. The regular expression evaluation is moved in the run() function since the regexp rule is calculated on runtime. If the regexp doesn’t match (current line of code does not end the multiline string) than LuaStr30 comes back to itself (recursive state). If the regexp does match, the multi-line string finished, and LuaStr30 goes to LuaIdleState.
    • grey -> standard states (black, blue and dotted blue arrow rules applied)
    • purple (ellipse) -> LuaStr30 can find (or not) the ]==] symbol. Purple ellipses shows what happen if ]==] was found and if ]==] was NOT found (see where the black arrows will go).
  • Arrow rules (when applied):
    • green -> LuaStr31 is an always-run state. LuaStr31 will always come back to LuaStr30 state
    • blue -> when the state finds what it is searching, go to the state pointed by blue arrow
    • blue (dotted) -> LuaStr01 and LuaStr02 regex rule can match both a single-line string AND the start of a multi-line string. If the a multi-line string matched, than go to the state pointed by the dotted blue arrow instead of the standard blue arrow
    • black -> When the regex rule of the state fails (the state does not find what it is searching). [except for red boxes]

4.2.3. About storing the last Lua function name

Unlike the WML states, there isn’t any Lua state that captures lua infos. The only extra info that could be auto-cached inside a lua code is the name of the last function opened / defined.

This kind of search required to use a specific regexp search, using re.search instead of re.match. Unlike all other searches, infact, we need to capture function name at any point of the line we are parsing, or the regexp will not work properly.

But, as explained at the beginning of this page, the state machine relies on re.match (best performance) to verify the regexp rule of every state. For this reason, LuaFinalState searches by itself if there is a function name somewhere, and, if so, stores the value into pywmlx.state.machine._pending_luafuncname.

4.3. State Machine and nodemanip

The previous chapter (Introducing WML and Lua parser) explained a lot of things, and expecially:

  • how WML nodes are stored in memory
  • how nodemanip module manage WML nodes (See: The nodemanip module).

But, an important thing was omissed: nodemanip is used by the statemachine.

digraph machine02 {
   node [shape="record", style="filled", fillcolor="grey",
         fontname="DejaVu Sans Mono"]
   
   machine [shape="box", fillcolor="orange",
      label="State Machine"
   ]
   start [
      label="{Initialize nodemanip\neven if the parsed file\nis a .lua file|function nodemanip.newfile()}"
   ]
   lua [shape="ellipse", fillcolor="purple",
      label="Lua file"
   ]
   lua_explain [
      label="{Lua states DOES NOT use nodemanip|nodemanip is used ONLY on\npending Lua string to obtain\nfileref and fileno}"
   ]
   wml [shape="ellipse", fillcolor="purple",
      label="WML file"
   ]
   wml_explain [ shape="box"
      label="WML states use nodemanip\nto perform any action"
   ]
  
   machine -> start
   start -> lua -> lua_explain
   start -> wml -> wml_explain
}

When wmlxgettext import pywmlx, nodemanip module is not loaded in pywmlx namespace: nodemanip is only internally used by state machine (module ./pywmlx/state/machine.py).