[cap-talk] Challenge in VM design

Jonathan S. Shapiro shap at eros-os.com
Mon Aug 4 10:37:10 CDT 2008


This note is prompted by Will Pearson's earlier note expressing his
desire to learn how VMs work. I want to state why we now think that we
need to build a new, lightweight VM for BitC.

Background:

BitC is a safe programming language designed for fully static
compilation. Originally, I pushed hard to preserve fully static,
separate compilation, but in the end we concluded that this goal is
incompatible with abstraction. In modern systems we want dynamic
libraries for a variety of reasons. The problem is that a dynamic
library cannot know all of the specializations over which it will be
called.

We got closer than one might expect. Given where BitC is today, it is
possible to do native-code compilation to what might be called "function
templates". A function template is turned into a function at run-time by
cloning it (copying the code), applying name mangling on the unresolved
references, and then performing symbol resolution on the referenced
functions and the sizes of types.

Better results can certainly be achieved with a more sophisticated
dynamic runtime, but this approach works, and it has the advantage that
a very simple implementation is possible.

But once you are going to do *any* link-time code generation, it becomes
very very tempting to do a low-level byte code. The cost of a simple,
direct-translation bytecode system is very very low (as we showed in
HDTrans), and the portability benefits are considerable.


Why a New Bytecode?

So why do we need a *new* bytecode engine?

The main reason has to do with value types. If you look at the way
abstract types are handled in other languages, you will find that all
abstract types are handled by reference, and that "dictionaries" are a
commonly used way of avoiding any need for run-time relocation. The
"everything is a reference" approach eliminates the need for object size
relocations.

The second thing you will notice is that there are no unboxed union
types. Anything that is a union is handled by reference. This simplifies
the design of the interface between the garbage collector and the type
system quite a lot, at the cost of breaking performance rather badly.

Both of these facts impose a high indirection burden. The very best VMs
for these languages invest an unreasonable amount of effort trying to
remove this indirection. The philosophy in BitC is a bit different:
don't put that indirection in in the first place by injecting artificial
constraints about type representations!

Both of these facts also mean that a compliant implementation of BitC
cannot really be built on .Net or JVM. You can build an implementation
that will have the same *semantics*, but it won't be compatible in the
sense of the underlying data structures, and that is a major goal of
BitC.

What BitC could really use is a very simple bytecode system that
supports unboxed types. This entails:

 1. Designing a bytecode verifier that can work over BitC-style objects.
    This implies designing something like a CLR for BitC.

 2. Designing a garbage collector interface that can advise the
    collector how to traverse unboxed union types.

 3. Building a fast translation system or interpreter for the resulting
    bytecode system (including the polyinstantiator).


So there is something new here to do, but on the other hand something
that is close enough to what has already been done elsewhere as not to
be completely intractable.


shap



More information about the cap-talk mailing list