Following up from my Ruby pop quiz the other day, I asked about the surprising behaviour on Stack Overflow.
Some commenters provided a little bit of help and then I did some more research. An answer to my own question is below.
Jay’s answer to a similar question linked to a section of the docs where it is explained:
The local variable is created when the parser encounters the assignment, not when the assignment occurs
There is a deeper analysis of this in the Ruby Hacking Guide (no section links available, search or scroll to the “Local Variable Definitions” section):
By the way, it is defined when “it appears”, this means it is defined even though it was not assigned. The initial value of a defined [but not yet assigned] variable is nil.
That answers the initial question but not how to learn more.
Jay and simonwo both suggested Ruby Under a Microscope by Pat Shaughnessy which I am keen to read.
Additionally, the rest of the Ruby Hacking Guide covers a lot of detail and actually examines the underlying C code. The Objects and Parser chapters were particularly relevant to the original question about variable assignment (not so much the Variables and constants chapter, it simply refers you back to the Objects chapter).
I also found, to see how the parser works, a useful tool is the Parser gem. Once it is installed (
gem install parser) you can start to examine different bits of code to see what the parser is doing with them.
That gem also bundles the
ruby-parse utility which lets you examine the way Ruby parses different snippets of code. The
-L options are most interesting to us and the
-e option is necessary if we just want to process a fragment of Ruby such as
foo = 'bar'. For example:
> ruby-parse -E -e "foo = 'bar'" foo = 'bar' ^~~ tIDENTIFIER "foo" expr_cmdarg [0 <= cond] [0 <= cmdarg] foo = 'bar' ^ tEQL "=" expr_beg [0 <= cond] [0 <= cmdarg] foo = 'bar' ^~~~~ tSTRING "bar" expr_end [0 <= cond] [0 <= cmdarg] foo = 'bar' ^ false "$eof" expr_end [0 <= cond] [0 <= cmdarg] (lvasgn :foo (str "bar"))
ruby-parse -L -e "foo = 'bar'" s(:lvasgn, :foo, s(:str, "bar")) foo = 'bar' ~~~ name ~ operator ~~~~~~~~~~~ expression s(:str, "bar") foo = 'bar' ~ end ~ begin ~~~~~ expression
Both of the references linked to at the top highlight an edge case. The Ruby docs used the example
p a if a = 0.zero? whlie the Ruby Hacking Guide used an equivalent example
p(lvar) if lvar = true, both of which raise a
= means assign,
== means compare. The
if foo = true construct in the edge case tells Ruby to check if the expression
foo = true evaluates to true. In other words, it assigns the value
foo and then checks if the result of that assignment is
true (it will be). That’s easily confused with the far more common
if foo == true which simply checks whether
foo compares equally to
true. Because the two are so easily confused, Ruby will issue a warning if we use the assignment operator in a conditional:
warning: found `= literal' in conditional, should be ==.
ruby-parse utility let’s compare the original example,
foo = 'bar' if false, with that edge case,
foo if foo = true:
> ruby-parse -L -e "foo = 'bar' if false" s(:if, s(:false), s(:lvasgn, :foo, s(:str, "bar")), nil) foo = 'bar' if false ~~ keyword ~~~~~~~~~~~~~~~~~~~~ expression s(:false) foo = 'bar' if false ~~~~~ expression s(:lvasgn, :foo, s(:str, "bar")) foo = 'bar' if false # Line 13 ~~~ name # <-- `foo` is a name ~ operator ~~~~~~~~~~~ expression s(:str, "bar") foo = 'bar' if false ~ end ~ begin ~~~~~ expression
As you can see above on lines 13 and 14 of the output, in the original example foo is a name (that is, a variable).
> ruby-parse -L -e "foo if foo = true" s(:if, s(:lvasgn, :foo, s(:true)), s(:send, nil, :foo), nil) foo if foo = true ~~ keyword ~~~~~~~~~~~~~~~~~ expression s(:lvasgn, :foo, s(:true)) foo if foo = true # Line 10 ~~~ name # <-- `foo` is a name ~ operator ~~~~~~~~~~ expression s(:true) foo if foo = true ~~~~ expression s(:send, nil, :foo) foo if foo = true # Line 18 ~~~ selector # <-- `foo` is a selector ~~~ expression
In the edge case example, the second foo is also a variable (lines 10 and 11), but when we look at lines 18 and 19 we see the first foo has been identified as a selector (that is, a method).
This shows that it is the parser that decides whether a thing is a method or a variable and that it parses the line in a different order to how it will later be evaluated.
When the parser runs:
foostarts with a lower case letter so it must be a method or a variable. It isn’t an existing variable and it IS NOT followed by an assignment operator so the parser concludes it must be a method
foo = trueis broken up as expression, operator, expression. Again, the expression
fooalso starts with a lower case letter so it must be a method or a variable. It isn’t an existing variable but it IS followed by an assignment operator so the parser knows to add it to the list of local variables.
Later when the evaluator runs:
foomethod (which will raise a
NameError, unless we handle it with