Compiling E: Phases of Transformations
Dean Tribble
tribble@netcom.com
Mon, 14 Aug 2000 09:53:01 -0700
At 07:12 PM 8/7/00 -0700, Mark S. Miller wrote:
>At 10:23 AM 8/7/00 , Dean Tribble wrote:
> >... the single best paper on the subject was in 1990 or thereabouts by
> Will Clinger. It was concise, and had in a much more obviously simple
> framework many fo the transformations that looked hard in Rabbit, etc.
>
>But which paper is it? http://www.ccs.neu.edu/home/will/papers.html is
>Clinger's list of publications. From a brief look, a plausible paper may be
>"Lambda, the ultimate label, or a simple optimizing compiler for Scheme".
>Since access to the paper at
>http://www.acm.org/pubs/citations/proceedings/lfp/182409/p128-clinger/ costs
>money, I'd like to know before I buy. Does
>http://www.ccs.neu.edu/home/will/Twobit/ultimate.html have all the relevant
>content?
I believe it is the Twobit compiler. I also think that is the right
paper. It's approach to Lambda lifting was a very useful inspiration.
>The above paper does not yet seem to have shown up on Gnutella.
>
> >Also, Appel's ML books apply the same techniques and explain them well.
>
>Which book?
I guess he now has more than one, but at the time it was the book derived
from his thesis, so his first book (compiling with continuations).
http://www.amazon.com/exec/obidos/ASIN/0521416957/qid=965749563/sr=1-1/002-9
212883-8732802
> >Note that the constrained subset may have *additional* semantics over
> the language (for example, in Joule, order of statements became
> significant after some transformation stage).
>
>A possible quibble: Is "additional semantics" another way of saying
>"constrained subset"? In your example, Is a Kernel-Joule obligated to
>execute a sequence of statements in sequence a constrained subset of
>Kernel-Joule? In any case, I agree with your point.
I was referring to stages prior to bytecodes, past kernel Joule.
> >>Each such further constrained subset can be given a name. The
> following are
> >>the language subsets and transformations I am currently thinking about
> for E.
> >
> >I would not assume that you will name all the stages. Once you have a
> compiler of this architecture, it is extremely useful to occasionally
> insert phases that do some really trivial thing (transform the code so
> that the number of references to a variable are stored in a constant
> expression (that will eventually be eliminated as dead code) right before
> the variable definition. Which language would that be? Often, the
> phases are the same sub language, you just know more.
>
>If you know more, then the language has additional semantics, or represents
>a constrained subset, or something. Ok, perhaps it's silly to say the
>encoding between a pair of is a separately named *language*, but if you know
>more about the output of a stage than its input, you have an encoding with
>different properties. Since each is an encoding of a program, we may as
>well say these are different programming languages -- all micro-variations
>of each other.
But in many cases, they are encodings in the same constructs that would
produce the same result when compiled by any other E compiler. Regardless,
this is a minor point, I think.
>So I might name the micro-variation of E output by your transformation
>VarCounted-E.
>
> >There's also a fundamental decision to make: Should everything be
> represented as syntactic expressions with the language (in which case you
> end up
>...
> >Something to consider here: only rename variables that would shadow
> other variables. Then, *most* variables don't have gobbledygook on
> them. Only shadowing variables need renaming. The downside would be
> that later transformations could result in shadowing. Hmm. I guess the
> best thing to do is rename with a special character such that it's easy
> to strip the renaming (Joule uses another technique, but it is loosely
> equivalent to renaming by appending to every variable name the ID (and a
> separator) of the syntactic scope).
>
>Ping raised a problem with special characters, or any similar trick that
>results in names the user couldn't have written: It would be nice to be able
>to take the text resulting in printing a tree and paste it into the
>programming environment as a working program with the same meaning.
If you always append something (even something legal), then you are
fine. Another compile phase on the same code will just append something
else.
> >>The transformation from User-E to Kernel-E is standard, portable, and
> >>canonical. Modulo choice of generated names, two E implementations must
> >
> >This seems like a bad idea (and is the real reason I wrote this
> message). Any legal optimization (or indeed any optimization at all that
> the programmer is willing to risk) should be allowed prior to this. The
> compiler works on behalf of the programmer; why should you limit his
> tools. Think of the compiler as a tool that the programmer uses to
> produce kernel E.
>
>I'm not sure if we're disagreeing or misunderstanding each other. When
>you speak of "the programmer", which programmer are you referring to? The
>E implementor or the E user? I want to provide the E user as
>platform-independent and deterministic a spec as possible. IMO, the one
>significant issue that Java got mostly right that Scheme got horribly
>wrong is the wiggle-room in the spec. The transformation from User-E to
>Kernel-E done by default should be deterministic. However, if an E user
>wishes to run his programs through a different transformation, great.
>
>If this doesn't resolve the issue, could you say more?
I'm talking about the E user. It seems like you are defining the
requirements of compilation by specifying the implementation of the
compiler, not by specifying the actual requirements. In very few instances
does Java say what the transformation or expansion should be; it describes
what the outcome should be during regular and exceptional execution. This
does not, for example, allow someone to implement a compiler that inlines
known procedures, evaluate compile-time constants, expand known
quasi-literal types into code that does the construction (a la lisp
backquote), etc. All of these are things that the E "user" shouldn't know
about or care about.
...
>On simplifying, I should probably split this into portable mandatory
>simplifications vs other, and put the first above the line as
>well. Question:
>should auditors accept a tree from which dead code had been eliminated, if
>that dead code would have failed the audit?
If you import two values for the same name but the name is never used, is
it an error? If a politician says something in a forest and noone hears
him, is he still lying?
> >>Transformation: Early Declaration -- variables declared at their
> >> allocation-contour. See
> >>
> http://www.erights.org/enative/varcases.html#scoping
> >>
> >>Language: E-gen -- the last stage normally represented in E parse
> >> trees.
> >
> >Same as the above. I assumed that this happened long before "kernel" E.
>
>Yup, good point. Early Declaration must come after Alpha Renaming and
>Deslotifying, otherwise it wouldn't be semantics-preserving to move the
>variable declaration earlier. So now that it looks like we'll be
>Deslotifying above the line, we could do early declaration there as well.
That doesn't sound right. My vague recollection is that Lambda lifting
takes this into account.
>But does this make programs easier for other programs to reason about, or is
>it only relevant to compilers? If only the latter, I see no strong argument
>for it to be above or below the line. I suppose, when in doubt, place it
>above the line, so the virtual machine can be simpler.
Yes. And this is another reason to specify the outcome, not the
implementation order (then you can change your mind later)