Distribute Parse Trees, Not Bytecode

Ka-Ping Yee ping@lfw.org
Tue, 22 Jun 1999 22:19:44 -0700 (PDT)


A (perhaps incorrect) assumption i had been making while reading
the Cambio example was that the Cambio itself, as a public service,
had code inspectable by both parties (the client exchanging money
and the supplier of the currency in demand).  I saw the Cambio as
the rules of a game between the supplier and the clients, and so i
attempted to formulate a Cambio whose source code had verifiable
safety properties for all sides.

Even if this was the incorrect context, it got me thinking about
how one might best facilitate openly verifiable game code.  My
conclusion was that parse trees are the best distribution format.
(I know you were already inclined to do things this way, MarkM,
and perhaps even for the same reasons i'm about to describe, but
anyway, i thought it wouldn't hurt to write them down.)

Here's my reasoning.  In order for the code to be verifiable, it
must be reasonably readable to a person.  If you distribute byte
code or object code, that makes it very difficult for someone to
read and check for security properties.  One could decompile the
program, but the decompiler may not produce the clearest source
code, and the complexity of a decompiler presents an opportunity
for error.  So, byte code is probably not a good idea.  The closer
we can get to the original source code, the more we can expect the
reader and the programmer to be talking about the same thing.

However, in order for the verification to be reliable, the code in
the form being verified must be precisely the same code that will be
run.  Moreover we would expect the originator to sign the program,
and he or she must sign the program in its runnable form or the
signature is meaningless.  But if the code is distributed only as
source, then everybody will have to compile the code at runtime.
This could be expensive to do.

What we need is a format which avoids the entire compilation
expense, represents the program compactly, yet still provides the
ability to *present* the code as source.  And so the answer is to
distribute signed parse trees (or, if you like, signed compressed
parse trees).  This lets execution hosts skip a large chunk of the
compilation cost, but still allows the runner of any program to
see exactly the same source that the programmer wrote.  The
machine running the program still has to emit code for it, but
lots of implementations already do this work in a JIT anyway.

It is also likely, as i believe MarkM has previously argued, that
there is no performance benefit to be gained from bytecode since
the target machine, with its platform-specific knowledge, will
likely do a much better job of optimizing the run-time code than
the bytecode compiler can.

Death to the VM, then?


-- ?!ng

"There's no point in being grown up if you can't be childish sometimes."
    -- Dr. Who