Parsing vs Evaluating Order

Jul 5, 2019

Following up from my Ruby pop quiz the other day, I asked about the surprising behaviour on Stack Overflow.

Some commenters provided a little bit of help and then I did some more research. An answer to my own question is below.

Jay’s answer to a similar question linked to a section of the docs where it is explained:

The local variable is created when the parser encounters the assignment, not when the assignment occurs

There is a deeper analysis of this in the Ruby Hacking Guide (no section links available, search or scroll to the “Local Variable Definitions” section):

By the way, it is defined when “it appears”, this means it is defined even though it was not assigned. The initial value of a defined [but not yet assigned] variable is nil.

That answers the initial question but not how to learn more.

Jay and simonwo both suggested Ruby Under a Microscope by Pat Shaughnessy which I am keen to read.

Additionally, the rest of the Ruby Hacking Guide covers a lot of detail and actually examines the underlying C code. The Objects and Parser chapters were particularly relevant to the original question about variable assignment (not so much the Variables and constants chapter, it simply refers you back to the Objects chapter).

I also found, to see how the parser works, a useful tool is the Parser gem. Once it is installed (gem install parser) you can start to examine different bits of code to see what the parser is doing with them.

That gem also bundles the ruby-parse utility which lets you examine the way Ruby parses different snippets of code. The -E and -L options are most interesting to us and the -e option is necessary if we just want to process a fragment of Ruby such as foo = 'bar'. For example:

> ruby-parse -E -e "foo = 'bar'"
foo = 'bar'   
^~~ tIDENTIFIER "foo"                           expr_cmdarg  [0 <= cond] [0 <= cmdarg] 
foo = 'bar'   
    ^ tEQL "="                                  expr_beg     [0 <= cond] [0 <= cmdarg] 
foo = 'bar'   
      ^~~~~ tSTRING "bar"                       expr_end     [0 <= cond] [0 <= cmdarg] 
foo = 'bar'   
           ^ false "$eof"                       expr_end     [0 <= cond] [0 <= cmdarg] 
(lvasgn :foo
  (str "bar"))

ruby-parse -L -e "foo = 'bar'"
s(:lvasgn, :foo,
  s(:str, "bar"))
foo = 'bar'
~~~ name      
    ~ operator        
~~~~~~~~~~~ expression
s(:str, "bar")
foo = 'bar'
          ~ end
      ~ begin         
      ~~~~~ expression

Both of the references linked to at the top highlight an edge case. The Ruby docs used the example p a if a = 0.zero? whlie the Ruby Hacking Guide used an equivalent example p(lvar) if lvar = true, both of which raise a NameError.

Sidenote: Remember = means assign, == means compare. The if foo = true construct in the edge case tells Ruby to check if the expression foo = true evaluates to true. In other words, it assigns the value true to foo and then checks if the result of that assignment is true (it will be). That’s easily confused with the far more common if foo == true which simply checks whether foo compares equally to true. Because the two are so easily confused, Ruby will issue a warning if we use the assignment operator in a conditional: warning: found `= literal' in conditional, should be ==.

Using the ruby-parse utility let’s compare the original example, foo = 'bar' if false, with that edge case, foo if foo = true:

> ruby-parse -L -e "foo = 'bar' if false"
s(:if,
  s(:false),
  s(:lvasgn, :foo,
    s(:str, "bar")), nil)
foo = 'bar' if false
            ~~ keyword         
~~~~~~~~~~~~~~~~~~~~ expression
s(:false)
foo = 'bar' if false
               ~~~~~ expression
s(:lvasgn, :foo,
  s(:str, "bar"))
foo = 'bar' if false     # Line 13
~~~ name                 # <-- `foo` is a name
    ~ operator        
~~~~~~~~~~~ expression
s(:str, "bar")
foo = 'bar' if false
          ~ end
      ~ begin         
      ~~~~~ expression

As you can see above on lines 13 and 14 of the output, in the original example foo is a name (that is, a variable).

> ruby-parse -L -e "foo if foo = true"
s(:if,
  s(:lvasgn, :foo,
    s(:true)),
  s(:send, nil, :foo), nil)
foo if foo = true
    ~~ keyword              
~~~~~~~~~~~~~~~~~ expression
s(:lvasgn, :foo,
  s(:true))
foo if foo = true         # Line 10
       ~~~ name           # <-- `foo` is a name
           ~ operator       
       ~~~~~~~~~~ expression
s(:true)
foo if foo = true
             ~~~~ expression
s(:send, nil, :foo)
foo if foo = true         # Line 18
~~~ selector              # <-- `foo` is a selector
~~~ expression

In the edge case example, the second foo is also a variable (lines 10 and 11), but when we look at lines 18 and 19 we see the first foo has been identified as a selector (that is, a method).

This shows that it is the parser that decides whether a thing is a method or a variable and that it parses the line in a different order to how it will later be evaluated.

When the parser runs:

it first sees the whole line as a single expression
it then breaks it up into two expressions separated by the if keyword
the first expression foo starts with a lower case letter so it must be a method or a variable. It isn’t an existing variable and it IS NOT followed by an assignment operator so the parser concludes it must be a method
the second expression foo = true is broken up as expression, operator, expression. Again, the expression foo also starts with a lower case letter so it must be a method or a variable. It isn’t an existing variable but it IS followed by an assignment operator so the parser knows to add it to the list of local variables.

Later when the evaluator runs:

it will first assign true to foo
it will then execute the conditional and check whether the result of that assignment is true (in this case it is)
it will then call the foo method (which will raise a NameError, unless we handle it with method_missing).

Matthew Lindfield Seager

Matthew Lindfield Seager

Parsing vs Evaluating Order