Distribute Parse Trees, Not Bytecode

Chip Morningstar chip@communities.com
Tue, 22 Jun 1999 22:50:17 -0700 (PDT)


Ping sez:
>Here's my reasoning.  In order for the code to be verifiable, it
>must be reasonably readable to a person.  If you distribute byte
>code or object code, that makes it very difficult for someone to
>read and check for security properties.  One could decompile the
>program, but the decompiler may not produce the clearest source
>code, and the complexity of a decompiler presents an opportunity
>for error.  So, byte code is probably not a good idea.  The closer
>we can get to the original source code, the more we can expect the
>reader and the programmer to be talking about the same thing.

[+] A good parse tree representation would also let you keep comments and
source text formatting information.

>What we need is a format which avoids the entire compilation
>expense, represents the program compactly, yet still provides the
>ability to *present* the code as source.  And so the answer is to
>distribute signed parse trees (or, if you like, signed compressed
>parse trees).  This lets execution hosts skip a large chunk of the
>compilation cost, but still allows the runner of any program to
>see exactly the same source that the programmer wrote.  The
>machine running the program still has to emit code for it, but
>lots of implementations already do this work in a JIT anyway.

[+] My experience has been that parsing is by far the most expensive
compilation step. I can imagine extremely compute-intensive code optimizers
that do all kinds of hairy wacko data flow analysis, but those kinds of things
can be done after the code is already running, so they aren't on the critical
path. Of course, it's also my experience that genuine optimizing compilers (as
opposed to the ones that just say "optimizing" on the box) are mainly urban
legend -- everybody swears they have a friend who knows somebody who has used
one, but they actually have the same ontological status as satanic cults and
alligators in the sewer.

>It is also likely, as i believe MarkM has previously argued, that
>there is no performance benefit to be gained from bytecode since
>the target machine, with its platform-specific knowledge, will
>likely do a much better job of optimizing the run-time code than
>the bytecode compiler can.

[#] One great virtue of a bytecode machine is that (assuming the architecture
is not completely stupid) you can port a (non-optimizing) VM easily. However,
creating a VM around a parse-tree code representation is just as easy.

>Death to the VM, then?

[-] No, the VM just becomes more loosely coupled to the language. A VM is still
useful as a reference implementation of the semantics and as a way to make the
runtime quickly and easily portable.