Announcing E 0.8.4: The Birthday Release
Mark S. Miller
markm@caplet.com
Fri, 28 May 1999 11:42:08 -0700
At 04:10 PM 5/27/99 , Dan Bornstein wrote:
>Wow! Looks like you've done some great stuff lately! A very happy
>birthday indeed!
Not until I get some more Surveys back ;)
>Since you asked for it, here are my off-the-cuff suggestions for reducing
>the E parse tree node types. The core of most of these suggestions is that
>in each of these cases you can re-cast a special form as a simple procedure
>call. If it turns out that you can inline away the call (which will
>presumably be most or all of the time), then that's naturally a Good Thing
>and should be done.
>
>The prices you pay are:
>
>* adding a couple required/primitive functions to the base set of
>definitions.
>* having to recognize something more complicated than merely node-type
>during compilation.
[#] We need to tease apart the purposes of an expansion
Your approach is indeed very interesting, and it raises the need to
distinguish between different purposes for expanding to a smaller kernel set
of constructs, especially if different expansions can be used for these
different purposes.
1) Code migration. I think you were the one to suggest (I think based on
Python doing likewise) that compressed parse-trees were a better format for
transmission of untrusted code, where the receiver wants to both verify and
compile quickly. Most of the time savings in not having the receiver
compile sources is actually from not having to lex and parse them.
Whatever time might be lost (if any) in compiling from trees rather than
virtual instructions is more than made up for by more straightforward
verification.
2) As the portable interface between the vm and other programs for talking
about computation. The best example is writing a portable unprivileged
debugger. This almost necessarily wants to use the same
expansion/kernel-definition as #1.
3) As explanation for humans. The best way to make a rich grammar quickly
understandable is to let the user ask to see pieces of code expanded:
? interp.expand := true
# value: true
User: Everything we now type will show the expansion as well as the
resulting value. Let's use a Perl Regex pattern:
? "a---b" =~ rx`a(@dashes-*)b`
# expansion:
# ("a---b" =~ _1q ? (rx__quasiParser matchMaker("a(@{0}-*)b") \
# matchBind(ListMaker run, _1q) =~ [dashes]))
# value: true
User: And what was that "." in "interp.expand" anyway? We'll see when we
turn expansions back off:
? interp.expand := false
# expansion:
# interp setExpand(define _2ares : final := false)
# _2ares
# value: false
User: Oh yeah, ".foo" is for accessing the property named "foo" according to
bean conventions. On the left side of an assignment it expands to calling
"setFoo()".
Even as the designer of the language, I've often de-confused myself by
looking at expansions. This does not have to use the same expansion as #1
and #2, though it currently does.
4) Portable semantically-directed tools for munging over programs. When
writing such tools in E, one will typically use the E quasi-parser.
Quasi-match is be by direct tree match on expanded trees. The expansion
makes the match seem more semantic than syntactic, two programs that
trivially have the same semantics but different trees won't match:
? e`2 + 3` =~ e`2 add(3)`
# value: true
? e`2 + 3` =~ e`3 + 2`
# value: false
The expansion by the quasi-parser means that a programmer casually writing
such tools will usually succeed without knowing what trees their
quasi-literals expand to. However, a programmer writing such tools
seriously will need to know this. This suggests that #4 use the same
expansion as #3, since that's the one we're encouraging people to learn
from. Since such tools will often want to speak with the virtual machine
about computations in progress (as debuggers do), #4 should also be the same
as #2. Absent other considerations, we've just collapsed all these back to
one expansion. Let's call this the "canonical expansion".
5) Representation passed between phases of a compiler. Given the
definitions of #1 and #2, our compiler certainly has to start with the
expansion agreed upon by these phases. However, it can internally further
expand or do whatever it wants, as long as the virtual can then talk about
the resulting computation in terms of #2.
Given these simultaneous considerations, it is indeed good to reduce the
number of special forms, even at the price of introducing equally
complicated magic primitive functions. However, my take is that it isn't
worth doing this when the price also includes turning nested blocks into new
closures. The main cost is just the understandability of the expansion to
normal programmers. The big problem is the combination of a magic function
& a new closure, since the magic function itself cannot be explained by
showing how it could be implemented in E. The canonical expansion should
Expand to the point of clarity, but no farther.
Currently, the only control-flow construct for which I expand a block into a
closure is the for loop. In this case, not only can one write the object
that invokes the closure in E, one frequently does.
Of course, as mentioned in #5, none of this precludes a compiler from
further expanding as you suggest to reduce its internal cases, as long as
the resulting computation can be reported in terms of #2.
>I haven't done this particular step with my Scheme compiler, but I've given
>it a bit of thought. One thing I'll be doing is noticing at compile time if
>a global reference refers to an immutable cell, and, if so, optimizes away
>the indirection through the global environment. Then, if it turns out that
>an object in functor position (not just the name--i.e. it won't work for a
>reference to a mutable cell) is one of the specially recognized primitives
>(such as call-with-current-continuation), then that will enable other
>possible optimizations (such as flattening away a lambda). I bet similar
>tricks will work for E if you decide to go this route.
[+] I will indeed be doing this kind of thing.
This is a nice explanation of it.
>Apologies if I get the syntax wrong; my life is Java and Scheme these days.
>Also, I know that the "object { ... }" syntax isn't primitive; I'm just
>using it as a notational convenience.
[;-)] Your brain is an archeological record on E's sedimentary layers.
I think I'll just call you The Ghost of Syntax Past. All the "quoted" text
below has actually been edited by me to be in current syntax before I reply
to it. My apologies to the CritMap program that will try to process these
alleged quotes. (CritMap, though I know you will read this apology, I know
you will not understand it, for which you also have my apologies.)
>* visitEscapeExpr: This is call-with-current-continuation, right? So,
>the surface syntax:
>
> escape <hatch> { <body> }
>
>should be expandable to a simple method call which passes a thunk
>constructed from the body:
>
> callWithCC (def _(<hatch>) { <body> }})
[-] Fails the above criterion.
Perfect example of where I think it's easier to teach a primitive special
form than a primitive function combined with an introduced closure.
>* visitHideExpr: The surface syntax
>
> { <body> }
>
>could expand to:
>
> (def _() { <body> })()
[-] Likewise.
>* visitLoopExpr: How do you feel about the iteration/recursion
>transformation? If you're okay with that, then this surface syntax:
>
> loop { <body> }
>
>could expand to:
>
> (define __loop() { <body>; __loop() })()
[-] Worse than likewise.
First, it does fail the same criterion. In addition, it would require a
tail recursive implementation of the language (or an infinite amount of
memory) to implement well. JVMs are not required to, and typically do not,
provide for tail recursive calls.
But it's worse than that. (Insight due to Dave Ungar & Self.) What about
the portable debugger's eye view? Even when tail recursion is available, it
can't used for code whose state is debugger accessible, if we want the
resulting debugging behavior to be comprehensible to the programmer. (The
programmer might eventually have a way to give a vm permission to throw away
the information that tail recursion throws away, but the default must be to
keep it.)
Even if you argue that the debugging default should be on the other foot, as
long as we might want a debugger to preserve a conventional call stack, we
need to distinguish between loops and recursion.
>* visitThrowExpr: I suspect "throw" can just be a primitive function.
[+] Jeez, why didn't I think of that.
Thanks!
More later.