Compiling E: Phases of Transformations
Mark S. Miller
markm@caplet.com
Sun, 06 Aug 2000 14:22:59 -0700
As shown by Scheme compilers and their descendents (notably Rabbit, Orbit,
and the unreleased Joule compiler), an elegant compilation technique is to
do much of the early work of compilation by source-to-source transformation
of the program. Each version of the program is still expressed within the
language, and has the same meaning as the previous version, but is expressed
in a narrower subset of the language.
At the end of this part of compilation, the original program is expressed in
a very constrained subset of the language. This subset should be designed
to make code generation from there a pleasant and effective task. Ideally,
all interesting platform independent aspects of compilation can be embodied
in the source-to-source transformation phases.
Each such further constrained subset can be given a name. The following are
the language subsets and transformations I am currently thinking about for E.
In further email, I'll then explain my rationale for these choices.
(Note: When I say "parse tree" or "parse node" I mean "AST" or "AST node".
Sorry, but it's what I'm used to.)
Language: User-E -- the language users write in
Transformation: Local Expansion -- currently done in the parser
If you think of "while" as a macro, think of this as
macro-expansion.
Language: Expanded-E -- uses very few parser node types
Transformation: Name Capture -- make explicit all significance of
user-written names, such as for upgrade support.
Language: Renamable-E -- meaning would be preserved under renaming
Transformation: Alpha Renaming -- non-outer-variables given unique generated
names
Language: Illuminated-E -- no shadowing
Transformation: De-cycling & "declare"-expansion -- forward refs become
Promises. (Dean plausibly suggests we leave this
till last.)
Language: Auditable-E -- portable interface to virtual machine
What the virtual machine is asked to load.
Transformation: Auditing -- Ask auditors to certify parse trees
Language: Kernel-E -- portable interface from virtual machine
The subset of Auditable-E that passes audit. Seen
by debuggers and other meta-level tools.
-------------
The transformation from User-E to Kernel-E is standard, portable, and
canonical. Modulo choice of generated names, two E implementations must
transform the same User-E program into the same Kernel-E program. This is
the form of E normally seen by programs that manipulate programs.
Below this point, the transformations are expected to be useful across
compilation targets, but with possible differences for different targets.
In any case, the following transformations are invisible to unprivileged
normal users of E. The virtual machine must appear to be interpreting
Kernel-E.
-------------
Transformation: Deslotifying -- de-reify variables-as-Slots. See
http://www.erights.org/enative/varcases.html#deslotify
Language: Memory-E -- variables are no more than memory locations
Transformation: Simplification -- constant folding, dead code elimination, ...
Possibly, some simplifications should be mixed into
earlier phases as well.
Language: Memory-E but not as stupid
Transformation: Early Declaration -- variables declared at their
allocation-contour. See
http://www.erights.org/enative/varcases.html#scoping
Language: E-gen -- the last stage normally represented in E parse
trees.
Cheers,
--MarkM