Compiling E: Phases of Transformations

Mark S. Miller markm@caplet.com
Sun, 06 Aug 2000 14:22:59 -0700


As shown by Scheme compilers and their descendents (notably Rabbit, Orbit, 
and the unreleased Joule compiler), an elegant compilation technique is to 
do much of the early work of compilation by source-to-source transformation 
of the program.  Each version of the program is still expressed within the 
language, and has the same meaning as the previous version, but is expressed 
in a narrower subset of the language.

At the end of this part of compilation, the original program is expressed in 
a very constrained subset of the language.  This subset should be designed 
to make code generation from there a pleasant and effective task.  Ideally, 
all interesting platform independent aspects of compilation can be embodied 
in the source-to-source transformation phases.

Each such further constrained subset can be given a name.  The following are 
the language subsets and transformations I am currently thinking about for E. 
In further email, I'll then explain my rationale for these choices.

(Note: When I say "parse tree" or "parse node" I mean "AST" or "AST node".  
Sorry, but it's what I'm used to.)


Language:        User-E -- the language users write in

Transformation: Local Expansion -- currently done in the parser
                        If you think of "while" as a macro, think of this as 
                        macro-expansion.

Language:        Expanded-E -- uses very few parser node types

Transformation: Name Capture -- make explicit all significance of 
                         user-written names, such as for upgrade support.

Language:        Renamable-E -- meaning would be preserved under renaming

Transformation: Alpha Renaming -- non-outer-variables given unique generated 
                        names

Language:        Illuminated-E -- no shadowing

Transformation: De-cycling & "declare"-expansion -- forward refs become 
                        Promises.  (Dean plausibly suggests we leave this 
                        till last.)

Language:        Auditable-E -- portable interface to virtual machine
                        What the virtual machine is asked to load.

Transformation: Auditing -- Ask auditors to certify parse trees

Language:        Kernel-E -- portable interface from virtual machine
                        The subset of Auditable-E that passes audit.  Seen
                        by debuggers and other meta-level tools.

------------- 
The transformation from User-E to Kernel-E is standard, portable, and 
canonical.  Modulo choice of generated names, two E implementations must 
transform the same User-E program into the same Kernel-E program.  This is 
the form of E normally seen by programs that manipulate programs.

Below this point, the transformations are expected to be useful across 
compilation targets, but with possible differences for different targets.  
In any case, the following transformations are invisible to unprivileged 
normal users of E.  The virtual machine must appear to be interpreting 
Kernel-E.
-------------

Transformation: Deslotifying -- de-reify variables-as-Slots.  See 
                         http://www.erights.org/enative/varcases.html#deslotify

Language:        Memory-E -- variables are no more than memory locations

Transformation: Simplification -- constant folding, dead code elimination, ...
                         Possibly, some simplifications should be mixed into 
                         earlier phases as well.

Language:         Memory-E but not as stupid

Transformation:  Early Declaration -- variables declared at their 
                         allocation-contour.  See 
                         http://www.erights.org/enative/varcases.html#scoping

Language:         E-gen -- the last stage normally represented in E parse 
                         trees.



         Cheers,
         --MarkM