Re: Announcing E 0.8.4: The Birthday Release Mark S. Miller (markm@caplet.com)
Fri, 28 May 1999 11:42:08 -0700

At 04:10 PM 5/27/99 , Dan Bornstein wrote:
>Wow! Looks like you've done some great stuff lately! A very happy
>birthday indeed!

Not until I get some more Surveys back ;)

>Since you asked for it, here are my off-the-cuff suggestions for reducing
>the E parse tree node types. The core of most of these suggestions is that
>in each of these cases you can re-cast a special form as a simple procedure
>call. If it turns out that you can inline away the call (which will
>presumably be most or all of the time), then that's naturally a Good Thing
>and should be done.
>
>The prices you pay are:
>
>* adding a couple required/primitive functions to the base set of
>definitions.
>* having to recognize something more complicated than merely node-type
>during compilation.

[#] We need to tease apart the purposes of an expansion

Your approach is indeed very interesting, and it raises the need to distinguish between different purposes for expanding to a smaller kernel set of constructs, especially if different expansions can be used for these different purposes.

  1. Code migration. I think you were the one to suggest (I think based on Python doing likewise) that compressed parse-trees were a better format for transmission of untrusted code, where the receiver wants to both verify and compile quickly. Most of the time savings in not having the receiver compile sources is actually from not having to lex and parse them. Whatever time might be lost (if any) in compiling from trees rather than virtual instructions is more than made up for by more straightforward verification.
  2. As the portable interface between the vm and other programs for talking about computation. The best example is writing a portable unprivileged debugger. This almost necessarily wants to use the same expansion/kernel-definition as #1.
  3. As explanation for humans. The best way to make a rich grammar quickly understandable is to let the user ask to see pieces of code expanded:

? interp.expand := true
# value: true

User: Everything we now type will show the expansion as well as the resulting value. Let's use a Perl Regex pattern:

? "a---b" =~ rx`a(@dashes-*)b`

    # expansion: 
    #   ("a---b" =~ _1q ? (rx__quasiParser matchMaker("a(@{0}-*)b") \
    #       matchBind(ListMaker run, _1q) =~ [dashes]))
    

# value: true

User: And what was that "." in "interp.expand" anyway? We'll see when we turn expansions back off:

? interp.expand := false

    # expansion: 
    #   interp setExpand(define _2ares : final := false)
    #   _2ares
    

# value: false

User: Oh yeah, ".foo" is for accessing the property named "foo" according to bean conventions. On the left side of an assignment it expands to calling "setFoo()".

Even as the designer of the language, I've often de-confused myself by looking at expansions. This does not have to use the same expansion as #1 and #2, though it currently does.

4) Portable semantically-directed tools for munging over programs. When writing such tools in E, one will typically use the E quasi-parser. Quasi-match is be by direct tree match on expanded trees. The expansion makes the match seem more semantic than syntactic, two programs that trivially have the same semantics but different trees won't match:

? e`2 + 3` =~ e`2 add(3)`
# value: true

? e`2 + 3` =~ e`3 + 2`
# value: false

The expansion by the quasi-parser means that a programmer casually writing such tools will usually succeed without knowing what trees their quasi-literals expand to. However, a programmer writing such tools seriously will need to know this. This suggests that #4 use the same expansion as #3, since that's the one we're encouraging people to learn from. Since such tools will often want to speak with the virtual machine about computations in progress (as debuggers do), #4 should also be the same as #2. Absent other considerations, we've just collapsed all these back to one expansion. Let's call this the "canonical expansion".

5) Representation passed between phases of a compiler. Given the definitions of #1 and #2, our compiler certainly has to start with the expansion agreed upon by these phases. However, it can internally further expand or do whatever it wants, as long as the virtual can then talk about the resulting computation in terms of #2.

Given these simultaneous considerations, it is indeed good to reduce the number of special forms, even at the price of introducing equally complicated magic primitive functions. However, my take is that it isn't worth doing this when the price also includes turning nested blocks into new closures. The main cost is just the understandability of the expansion to normal programmers. The big problem is the combination of a magic function & a new closure, since the magic function itself cannot be explained by showing how it could be implemented in E. The canonical expansion should

Expand to the point of clarity, but no farther.

Currently, the only control-flow construct for which I expand a block into a closure is the for loop. In this case, not only can one write the object that invokes the closure in E, one frequently does.

Of course, as mentioned in #5, none of this precludes a compiler from further expanding as you suggest to reduce its internal cases, as long as the resulting computation can be reported in terms of #2.

>I haven't done this particular step with my Scheme compiler, but I've given
>it a bit of thought. One thing I'll be doing is noticing at compile time if
>a global reference refers to an immutable cell, and, if so, optimizes away
>the indirection through the global environment. Then, if it turns out that
>an object in functor position (not just the name--i.e. it won't work for a
>reference to a mutable cell) is one of the specially recognized primitives
>(such as call-with-current-continuation), then that will enable other
>possible optimizations (such as flattening away a lambda). I bet similar
>tricks will work for E if you decide to go this route.

[+] I will indeed be doing this kind of thing. This is a nice explanation of it.

>Apologies if I get the syntax wrong; my life is Java and Scheme these days.
>Also, I know that the "object { ... }" syntax isn't primitive; I'm just
>using it as a notational convenience.

[;-)] Your brain is an archeological record on E's sedimentary layers. I think I'll just call you The Ghost of Syntax Past. All the "quoted" text below has actually been edited by me to be in current syntax before I reply to it. My apologies to the CritMap program that will try to process these alleged quotes. (CritMap, though I know you will read this apology, I know you will not understand it, for which you also have my apologies.)

>* visitEscapeExpr: This is call-with-current-continuation, right? So,
>the surface syntax:
>
> escape <hatch> { <body> }
>
>should be expandable to a simple method call which passes a thunk
>constructed from the body:
>
> callWithCC (def _(<hatch>) { <body> }})

[-] Fails the above criterion.
Perfect example of where I think it's easier to teach a primitive special form than a primitive function combined with an introduced closure.

>* visitHideExpr: The surface syntax
>
> { <body> }
>
>could expand to:
>
> (def _() { <body> })()

[-] Likewise.

>* visitLoopExpr: How do you feel about the iteration/recursion
>transformation? If you're okay with that, then this surface syntax:
>
> loop { <body> }
>
>could expand to:
>
> (define __loop() { <body>; __loop() })()

[-] Worse than likewise.
First, it does fail the same criterion. In addition, it would require a tail recursive implementation of the language (or an infinite amount of memory) to implement well. JVMs are not required to, and typically do not, provide for tail recursive calls.

But it's worse than that. (Insight due to Dave Ungar & Self.) What about the portable debugger's eye view? Even when tail recursion is available, it can't used for code whose state is debugger accessible, if we want the resulting debugging behavior to be comprehensible to the programmer. (The programmer might eventually have a way to give a vm permission to throw away the information that tail recursion throws away, but the default must be to keep it.)

Even if you argue that the debugging default should be on the other foot, as long as we might want a debugger to preserve a conventional call stack, we need to distinguish between loops and recursion.

>* visitThrowExpr: I suspect "throw" can just be a primitive function.

[+] Jeez, why didn't I think of that.
Thanks!

More later.