Debugging (was: Twobit or not Twobit)

Mark S. Miller markm@caplet.com
Wed, 30 Aug 2000 14:19:31 -0700


Even good support for reading programs includes good support for debugging 
programs.  Often, when I read someone else's Smalltalk or Java code, I do so 
from within a debugger.  By posing examples and following the execution 
path, I learn much more quickly what I need to know that I could by reading 
the source code alone.

In briefly looking again at the Rabbit, Orbit, and Twobit papers ("Rabbit, a 
Compiler for Scheme" --Guy Steele 1978, "Orbit: An Optimizing Compiler for 
Scheme" --Kranz 1988), as well as the pedagogical Scheme compiler in Abelson 
& Sussman, I did not even see debugging support mentioned -- not even in the 
"Future Work" or "Final Thoughts" section.  Many Scheme IDEs do provide 
debugging support, but the almost always in a way particular to a given IDE.

Of course, if debugging is supported by source-to-source transformation 
ahead of compilation, in order to create a self-instrumenting version of the 
original program, then the debugger will work on top of any correct compiler 
without the compiler needing to be aware of any debugging issues.  This 
approach was used originally, AFAIK, by Udi Shapiro for FCP (Flat Concurrent 
Prolog), and is currently used by "PSD - a Portable Scheme Debugger" 
http://www.cs.tut.fi/staff/pk/scheme/psd/article/article.html .  

PSD also demonstrates the problem of using this approach without compiler or 
virtual machine support: Lack of access to meta-information normally 
considered crucial to debugging, such as stack backtrace information, and, 
even after forsaking such access, fully instrumented code is typically 
around 14 times larger and 170 to 290 times slower.  The consequent 
recommended usage style -- only instrument the parts of the program you're 
currently debugging -- is inapplicable for debugging live production 
processes that have gotten into trouble.  

It's good that airplanes are built with flight recorders.

Conventional compiler also produce more efficient code when compiled "-O" 
rather than "-g", but the factors are *much* smaller, enabling much 
production code to be shipped after having been compiled "-g".  For most 
products, the benefit of being able to diagnose failures in the field is 
worth a moderate efficiency cost.  With persistence, even more so.


                       Hijacking Java Debuggers


The Java industry has invested an enormous amount of effort in creating 
reasonable quality debuggers for Java -- some almost as good as normal 
Smalltalk debuggers.  As of Java 2, Java even has a portable API with which 
a debugger can inspect and modify virtual machine state.  Java virtual 
machines are doing fancier internal optimization, but unless exempted by 
special flags, are normally obligated to only optimize subject to the 
constraint that they preserve the naive execution illusion presented though 
the debugging API.  Java pays a performance cost for this benefit, but E on 
Java is already paying this cost regardless, so it may as well collect the 
corresponding benefit.

                                Instance Variable Names

Of course, to do so, E pays an additional cost of forsaking mappings to Java 
that would be more efficient but less straightforward.  The primary issue is 
number of classes.  Before, when I was more focused on efficiency than 
hijacking debuggers, I had in mind a compilation strategy that would have 
resulted in one Java class per E source file.  Since an E source file 
contains many object-definition-expressions, and each has its own unique set 
of instance variable (non-Outer variables used freely), this entailed using 
a Java array to hold the instance variable state.  This, of course, would 
render stock Java debuggers useless.

Only by creating one Java class per E object-definition-expressions can we 
have the Java instance variable names seen in the debugger be the variable 
names the E programmer wrote (or an obvious mangling of this name -- see 
below).  This is the constraining decision setting the terms for the rest of 
the mapping below.  

The cost is that, last we measured (JDK 1.1.7 I think, measured by Chip at 
EC) the Sun Java implementation paid an enormous fixed space cost per class. 
  Back then, when Java used an all-at-once garbage collector rather than a 
generational collector, this enormous space cost turned into an enormous 
time cost, as the entire process thrashed virtual memory on each garbage 
collect.  However, memory has gotten cheaper since then, the EC programming 
environment generated several Java classes for each class-equivalent the 
programmer wrote, and hopefully Java implementations may have even improved 
since those days.

                                             Other Names

Whenever possible, we should map other names the E programmer wrote to the 
obvious Java entity with the same name, or a name which is an obvious mangle 
of the original name.  Since E only has two kinds of names, variable names 
(nouns) and message names (verbs), this would seem to be easy.  Let's start 
with our canonical example of the simplest interesting E program:

         define PairMaker make(value) :any {
             define getter()    :any { value }
             define setter(newValue) { value := newValue }
             [getter, setter]
         }

Let's say that, relative to the source root under which it's found, this is 
the contents of foo/bar/PairMaker.emaker.  With the new 
expansion-to-Kernel-E to appear in the next E release, this will expand to:

     define PairMaker := define "foo.bar.PairMaker" {
         to make(value) :any {
             define getter := define "foo.bar.PairMaker$getter" {
                 to run :any {
                     value
                 }
             }
             define setter := define "foo.bar.PairMaker$setter" {
                 to run(newValue) :void {
                     value := newValue
                 }
             }
             ListMaker run(getter, setter)
         }
     }

The change you see above from the existing E expansion is due to a 
suggestion by Dan.  In user-E, the identifier being defined in an 
object-definition-expression serves two purposes: it initializes a variable 
of that name in this lexical scope with an instance of the object, just as 
an assignment would have done, and names the behavior that all these 
instances share, so that a new release may upgrade these instances to obey 
the new behavior, and to provide a Miranda printing method.  (A Miranda 
method is one that will be provided for the object if the object does not 
provide its own.)

     ? PairMaker make(3)
     # value: [<getter>, <setter>]

A while ago Dan suggested that these two roles for the name be separated in 
Kernel-E.  At the time, I resisted, but as I've started to write 
transformation code, it quickly became apparent that Dan was right.  The 
only remaining kernel E expression that evaluates an expression on the right 
and then matches it against a pattern on the left, and thereby has to deal 
with cyclic definitions, is the "define/:=" expression.  The complexity of 
the cycle issue is now fully separate from the complexity of the 
object-definition expression.

Under the E to Java mapping I now have in mind, this would translate to:

     package foo.bar;
     
     import ...;
     
     public class PairMaker_Class extends ... {
     
         static Object ListMaker = ...;
     
         static public class getter_Class extends ... {
     
             private Object[] value_Box;
     
             public getter_Class(Object[] value_Box) {
                 this.value_Box = value_Box;
             }
     
             public Object run() {
                 return value_Box[0];
             }
         }
     
         static public class setter_Class extends ... {
     
             private Object[] value_Box;
     
             public setter_Class(Object[] value_Box) {
                 this.value_Box = value_Box;
             }
     
             public void run(Object newValue) {
                 value_Box[0] = newValue;
             }
         }
     
         public Object make(Object value) {
             Object[] value_Box = { value };
             Object getter = new getter_Class(value_Box);
             Object setter = new setter_Class(value_Box);
             return E.call(ListMaker, "run", getter, setter);
         }
     }

The correspondence with the original should be obvious.  What's less obvious 
is the rationale for the particular choices made above, and what 
alternatives were considered.  I hope to walk through all this in upcoming 
email.

A terminology note: There is one Java *object* per E facet, not per E 
composite.  This is another reason for us to use the term "object" to refer 
to this quantum level of individual combinations of state and behavior:
As we hijack Java debuggers, our users can use their debugger's 
documentation without as much confusion.


         Cheers,
         --MarkM