Announcing E 0.8.4: The Birthday Release

Mark S. Miller markm@caplet.com
Fri, 28 May 1999 11:42:08 -0700


At 04:10 PM 5/27/99 , Dan Bornstein wrote:
>Wow! Looks like you've done some great stuff lately! A very happy
>birthday indeed!

Not until I get some more Surveys back ;)


>Since you asked for it, here are my off-the-cuff suggestions for reducing
>the E parse tree node types. The core of most of these suggestions is that
>in each of these cases you can re-cast a special form as a simple procedure
>call. If it turns out that you can inline away the call (which will
>presumably be most or all of the time), then that's naturally a Good Thing
>and should be done.
>
>The prices you pay are: 
>
>* adding a couple required/primitive functions to the base set of
>definitions.
>* having to recognize something more complicated than merely node-type
>during compilation.

[#] We need to tease apart the purposes of an expansion

Your approach is indeed very interesting, and it raises the need to 
distinguish between different purposes for expanding to a smaller kernel set 
of constructs, especially if different expansions can be used for these 
different purposes.

1) Code migration.  I think you were the one to suggest (I think based on 
Python doing likewise) that compressed parse-trees were a better format for 
transmission of untrusted code, where the receiver wants to both verify and 
compile quickly.  Most of the time savings in not having the receiver 
compile sources is actually from not having to lex and parse them.  
Whatever time might be lost (if any) in compiling from trees rather than 
virtual instructions is more than made up for by more straightforward 
verification.

2) As the portable interface between the vm and other programs for talking 
about computation.  The best example is writing a portable unprivileged 
debugger.  This almost necessarily wants to use the same 
expansion/kernel-definition as #1.

3) As explanation for humans.  The best way to make a rich grammar quickly 
understandable is to let the user ask to see pieces of code expanded:

    ? interp.expand := true
    # value: true

User: Everything we now type will show the expansion as well as the 
resulting value.  Let's use a Perl Regex pattern:

    ? "a---b" =~ rx`a(@dashes-*)b`
    # expansion: 
    #   ("a---b" =~ _1q ? (rx__quasiParser matchMaker("a(@{0}-*)b") \
    #       matchBind(ListMaker run, _1q) =~ [dashes]))
    
    # value: true

User: And what was that "." in "interp.expand" anyway?  We'll see when we 
turn expansions back off:

    ? interp.expand := false
    # expansion: 
    #   interp setExpand(define _2ares : final := false)
    #   _2ares
    
    # value: false

User: Oh yeah, ".foo" is for accessing the property named "foo" according to 
bean conventions.  On the left side of an assignment it expands to calling 
"setFoo()".

Even as the designer of the language, I've often de-confused myself by 
looking at expansions.  This does not have to use the same expansion as #1 
and #2, though it currently does.

4) Portable semantically-directed tools for munging over programs.  When 
writing such tools in E, one will typically use the E quasi-parser.  
Quasi-match is be by direct tree match on expanded trees.  The expansion 
makes the match seem more semantic than syntactic, two programs that 
trivially have the same semantics but different trees won't match:

    ? e`2 + 3` =~ e`2 add(3)`
    # value: true
    
    ? e`2 + 3` =~ e`3 + 2`
    # value: false

The expansion by the quasi-parser means that a programmer casually writing 
such tools will usually succeed without knowing what trees their 
quasi-literals expand to.  However, a programmer writing such tools 
seriously will need to know this.  This suggests that #4 use the same 
expansion as #3, since that's the one we're encouraging people to learn 
from.  Since such tools will often want to speak with the virtual machine 
about computations in progress (as debuggers do), #4 should also be the same 
as #2.  Absent other considerations, we've just collapsed all these back to 
one expansion.  Let's call this the "canonical expansion".

5) Representation passed between phases of a compiler.  Given the 
definitions of #1 and #2, our compiler certainly has to start with the 
expansion agreed upon by these phases.  However, it can internally further 
expand or do whatever it wants, as long as the virtual can then talk about 
the resulting computation in terms of #2.



Given these simultaneous considerations, it is indeed good to reduce the 
number of special forms, even at the price of introducing equally 
complicated magic primitive functions.  However, my take is that it isn't 
worth doing this when the price also includes turning nested blocks into new 
closures.  The main cost is just the understandability of the expansion to 
normal programmers.  The big problem is the combination of a magic function 
& a new closure, since the magic function itself cannot be explained by 
showing how it could be implemented in E.  The canonical expansion should

	Expand to the point of clarity, but no farther.

Currently, the only control-flow construct for which I expand a block into a 
closure is the for loop.  In this case, not only can one write the object 
that invokes the closure in E, one frequently does.

Of course, as mentioned in #5, none of this precludes a compiler from 
further expanding as you suggest to reduce its internal cases, as long as 
the resulting computation can be reported in terms of #2.


>I haven't done this particular step with my Scheme compiler, but I've given
>it a bit of thought. One thing I'll be doing is noticing at compile time if
>a global reference refers to an immutable cell, and, if so, optimizes away
>the indirection through the global environment. Then, if it turns out that
>an object in functor position (not just the name--i.e. it won't work for a
>reference to a mutable cell) is one of the specially recognized primitives
>(such as call-with-current-continuation), then that will enable other
>possible optimizations (such as flattening away a lambda). I bet similar
>tricks will work for E if you decide to go this route.

[+] I will indeed be doing this kind of thing.
This is a nice explanation of it.


>Apologies if I get the syntax wrong; my life is Java and Scheme these days.
>Also, I know that the "object { ... }" syntax isn't primitive; I'm just
>using it as a notational convenience.

[;-)] Your brain is an archeological record on E's sedimentary layers.
I think I'll just call you The Ghost of Syntax Past.  All the "quoted" text 
below has actually been edited by me to be in current syntax before I reply 
to it.  My apologies to the CritMap program that will try to process these 
alleged quotes.  (CritMap, though I know you will read this apology, I know 
you will not understand it, for which you also have my apologies.)


>* visitEscapeExpr: This is call-with-current-continuation, right? So,
>the surface syntax:
>
>  escape <hatch> { <body> }
>
>should be expandable to a simple method call which passes a thunk
>constructed from the body:
>
>  callWithCC (def _(<hatch>) { <body> }})

[-] Fails the above criterion.
Perfect example of where I think it's easier to teach a primitive special 
form than a primitive function combined with an introduced closure.


>* visitHideExpr: The surface syntax
>
>  { <body> }
>
>could expand to:
>
>  (def _() { <body> })()

[-] Likewise.


>* visitLoopExpr: How do you feel about the iteration/recursion
>transformation? If you're okay with that, then this surface syntax:
>
>  loop { <body> }
>
>could expand to:
>
>  (define __loop() { <body>; __loop() })()

[-] Worse than likewise.
First, it does fail the same criterion.  In addition, it would require a 
tail recursive implementation of the language (or an infinite amount of 
memory) to implement well.  JVMs are not required to, and typically do not, 
provide for tail recursive calls.

But it's worse than that.  (Insight due to Dave Ungar & Self.)  What about 
the portable debugger's eye view?  Even when tail recursion is available, it 
can't used for code whose state is debugger accessible, if we want the 
resulting debugging behavior to be comprehensible to the programmer.  (The 
programmer might eventually have a way to give a vm permission to throw away 
the information that tail recursion throws away, but the default must be to 
keep it.)

Even if you argue that the debugging default should be on the other foot, as 
long as we might want a debugger to preserve a conventional call stack, we 
need to distinguish between loops and recursion.


>* visitThrowExpr: I suspect "throw" can just be a primitive function.

[+] Jeez, why didn't I think of that.
Thanks!


More later.