Debugging (was: Twobit or not Twobit)
Mark S. Miller
markm@caplet.com
Wed, 30 Aug 2000 14:19:31 -0700
Even good support for reading programs includes good support for debugging
programs. Often, when I read someone else's Smalltalk or Java code, I do so
from within a debugger. By posing examples and following the execution
path, I learn much more quickly what I need to know that I could by reading
the source code alone.
In briefly looking again at the Rabbit, Orbit, and Twobit papers ("Rabbit, a
Compiler for Scheme" --Guy Steele 1978, "Orbit: An Optimizing Compiler for
Scheme" --Kranz 1988), as well as the pedagogical Scheme compiler in Abelson
& Sussman, I did not even see debugging support mentioned -- not even in the
"Future Work" or "Final Thoughts" section. Many Scheme IDEs do provide
debugging support, but the almost always in a way particular to a given IDE.
Of course, if debugging is supported by source-to-source transformation
ahead of compilation, in order to create a self-instrumenting version of the
original program, then the debugger will work on top of any correct compiler
without the compiler needing to be aware of any debugging issues. This
approach was used originally, AFAIK, by Udi Shapiro for FCP (Flat Concurrent
Prolog), and is currently used by "PSD - a Portable Scheme Debugger"
http://www.cs.tut.fi/staff/pk/scheme/psd/article/article.html .
PSD also demonstrates the problem of using this approach without compiler or
virtual machine support: Lack of access to meta-information normally
considered crucial to debugging, such as stack backtrace information, and,
even after forsaking such access, fully instrumented code is typically
around 14 times larger and 170 to 290 times slower. The consequent
recommended usage style -- only instrument the parts of the program you're
currently debugging -- is inapplicable for debugging live production
processes that have gotten into trouble.
It's good that airplanes are built with flight recorders.
Conventional compiler also produce more efficient code when compiled "-O"
rather than "-g", but the factors are *much* smaller, enabling much
production code to be shipped after having been compiled "-g". For most
products, the benefit of being able to diagnose failures in the field is
worth a moderate efficiency cost. With persistence, even more so.
Hijacking Java Debuggers
The Java industry has invested an enormous amount of effort in creating
reasonable quality debuggers for Java -- some almost as good as normal
Smalltalk debuggers. As of Java 2, Java even has a portable API with which
a debugger can inspect and modify virtual machine state. Java virtual
machines are doing fancier internal optimization, but unless exempted by
special flags, are normally obligated to only optimize subject to the
constraint that they preserve the naive execution illusion presented though
the debugging API. Java pays a performance cost for this benefit, but E on
Java is already paying this cost regardless, so it may as well collect the
corresponding benefit.
Instance Variable Names
Of course, to do so, E pays an additional cost of forsaking mappings to Java
that would be more efficient but less straightforward. The primary issue is
number of classes. Before, when I was more focused on efficiency than
hijacking debuggers, I had in mind a compilation strategy that would have
resulted in one Java class per E source file. Since an E source file
contains many object-definition-expressions, and each has its own unique set
of instance variable (non-Outer variables used freely), this entailed using
a Java array to hold the instance variable state. This, of course, would
render stock Java debuggers useless.
Only by creating one Java class per E object-definition-expressions can we
have the Java instance variable names seen in the debugger be the variable
names the E programmer wrote (or an obvious mangling of this name -- see
below). This is the constraining decision setting the terms for the rest of
the mapping below.
The cost is that, last we measured (JDK 1.1.7 I think, measured by Chip at
EC) the Sun Java implementation paid an enormous fixed space cost per class.
Back then, when Java used an all-at-once garbage collector rather than a
generational collector, this enormous space cost turned into an enormous
time cost, as the entire process thrashed virtual memory on each garbage
collect. However, memory has gotten cheaper since then, the EC programming
environment generated several Java classes for each class-equivalent the
programmer wrote, and hopefully Java implementations may have even improved
since those days.
Other Names
Whenever possible, we should map other names the E programmer wrote to the
obvious Java entity with the same name, or a name which is an obvious mangle
of the original name. Since E only has two kinds of names, variable names
(nouns) and message names (verbs), this would seem to be easy. Let's start
with our canonical example of the simplest interesting E program:
define PairMaker make(value) :any {
define getter() :any { value }
define setter(newValue) { value := newValue }
[getter, setter]
}
Let's say that, relative to the source root under which it's found, this is
the contents of foo/bar/PairMaker.emaker. With the new
expansion-to-Kernel-E to appear in the next E release, this will expand to:
define PairMaker := define "foo.bar.PairMaker" {
to make(value) :any {
define getter := define "foo.bar.PairMaker$getter" {
to run :any {
value
}
}
define setter := define "foo.bar.PairMaker$setter" {
to run(newValue) :void {
value := newValue
}
}
ListMaker run(getter, setter)
}
}
The change you see above from the existing E expansion is due to a
suggestion by Dan. In user-E, the identifier being defined in an
object-definition-expression serves two purposes: it initializes a variable
of that name in this lexical scope with an instance of the object, just as
an assignment would have done, and names the behavior that all these
instances share, so that a new release may upgrade these instances to obey
the new behavior, and to provide a Miranda printing method. (A Miranda
method is one that will be provided for the object if the object does not
provide its own.)
? PairMaker make(3)
# value: [<getter>, <setter>]
A while ago Dan suggested that these two roles for the name be separated in
Kernel-E. At the time, I resisted, but as I've started to write
transformation code, it quickly became apparent that Dan was right. The
only remaining kernel E expression that evaluates an expression on the right
and then matches it against a pattern on the left, and thereby has to deal
with cyclic definitions, is the "define/:=" expression. The complexity of
the cycle issue is now fully separate from the complexity of the
object-definition expression.
Under the E to Java mapping I now have in mind, this would translate to:
package foo.bar;
import ...;
public class PairMaker_Class extends ... {
static Object ListMaker = ...;
static public class getter_Class extends ... {
private Object[] value_Box;
public getter_Class(Object[] value_Box) {
this.value_Box = value_Box;
}
public Object run() {
return value_Box[0];
}
}
static public class setter_Class extends ... {
private Object[] value_Box;
public setter_Class(Object[] value_Box) {
this.value_Box = value_Box;
}
public void run(Object newValue) {
value_Box[0] = newValue;
}
}
public Object make(Object value) {
Object[] value_Box = { value };
Object getter = new getter_Class(value_Box);
Object setter = new setter_Class(value_Box);
return E.call(ListMaker, "run", getter, setter);
}
}
The correspondence with the original should be obvious. What's less obvious
is the rationale for the particular choices made above, and what
alternatives were considered. I hope to walk through all this in upcoming
email.
A terminology note: There is one Java *object* per E facet, not per E
composite. This is another reason for us to use the term "object" to refer
to this quantum level of individual combinations of state and behavior:
As we hijack Java debuggers, our users can use their debugger's
documentation without as much confusion.
Cheers,
--MarkM