[eros-arch] On Garbage Collection

Jonathan S. Shapiro eros-arch@mail.eros-os.org
22 Nov 2003 13:00:24 -0500


In the course of deciding not to submit a DARPA proposal, I spent some
time looking at what is going on in the world of penetration attacks. I
conclude that we are solving the wrong problem -- or rather, we have
been solving phase II in the hope that other people would solve phase I.
I have now concluded that this hope is in vain.

Penetration attacks can broadly be divided into two categories:

1. Exploitation of interpreter errors. These are best addressed by least
privilege, very much in the style that we have all been pursuing.

2. Exploitation of buffer overruns to load hostile code. The problem
here is entirely due to the widespread use of C and C++, and is a direct
result of the absence of unchecked vectors and a first class string
type. While tools like "ccured" and "PointGuard" help, we ultimately
need to dump these languages in favor of safer tools.

Which raises the question: what should the replacement for C look like
and how do we transition?

There are those who may argue that we already have a replacement.
Possible candidates might include C#, Java, or D (I restrict my
attention here to statically typed languages, but without prejudice). I
think that none of these are a good replacement.

Actually, I think that the "what should it look like" question isn't
difficult to answer -- D and C# appear to have matters mostly right. The
transitional sticking points are garbage collection and excessive
dependency on dynamic compilation. Dynamic compilation is mostly an
issue because of performance and unbound dependencies (which are
anathema in robust systems). This brings us down to the GC question.

There are two problems with GC:

1. While GC is a good thing (TM), it isn't a universally good thing.
   There are applications for which GC is inappropriate:

      + Concurrently multithreaded apps
      + Apps with legacy library requirements
      + Apps with low variance requirements

   Concurrently multithreaded apps are a problem because there simply
   isn't a good GC technology for them yet. I'm prepared to believe
   that one may emerge, but it hasn't emerged yet.

   Apps with legacy (C) library requirements are restricted to use of
   conservative GC, and our experiences with this in OpenCM has been a
   complete disaster -- enough so that we are backing it out completely.
   This is not meant as an assault on Hans Boehm or anyone else. The
   problem in essence is that conservative GC doesn't work well on
   programs that retain a large live heap.

   The variance issue is simply that GC arises at unpredictable times
   for most apps we do not and should not care, but for some we do.
   Don't bother whining about allegedly incremental GC - the overheads
   are too high and there are boundary cases.

2. Given that conservative GC is not good enough, we need a transition
   language that does not rely on GC. I'm not entirely convinced that
   such a language can be memory safe.

I believe that it may be possible to design a language providing
explicit storage management that could transition to GC-based storage
management in the absence of legacy libraries.

Reactions?



shap