Fault Containment Domains

Bryan Ford baford@schirf.cs.utah.edu
Tue, 13 Dec 94 13:12:49 MST


>   figuring out how to solve this problem, allowing _all_
>   the basic system abstractions to "exist" on multiple hosts
>   at once in a reasonably performant way, would be a very good
>   research direction...
>
>This is indeed the problem I am looking at.  I'm a bit hesitant,
>because it's a big challenge to implement a new OS, and at the moment
>I'ld be the only person working on it.  An interesting question that
>I'll have to think about is whether this could be done incrementally
>based on some existing system (e.g. Mach) substrate.

I strongly recommend starting with something like Mach.  For one thing,
otherwise you'll be spending most of your time writing device drivers and
other nitty-gritty infrastructure things, and what you end up with is likely
to be, at best, a toy OS that demonstrates a particular concept in a
very limited problem domain, but isn't obviously and easily generalizable
to "real" OSes.  Tons and tons of research OSes have been created that
demonstrate various "cool" concepts, but still everybody just uses
Unix (or MSDOS! :-) ) because nobody knows how to integrate those cool
new concepts back into real systems.  Starting from scratch may be fun,
but doing so I think eliminates much of the benefit of the research.

Or, as Dave Patterson put it in his OSDI keynote address, entitled
"How to have a bad research career", there are two ways to go about
doing things: the Scientific Method and the Computer-Scientific Method.
In the Scientific Method, you start with a well-defined hypothesis,
and then perform a series of experiments, changing one variable
at each stage.  In the Computer-Scientific Method, you start with
a hunch, then perform one experiment, changing all the variables.
I think you can see how this applies. :-)

If you're worried about the difficulty of changing Mach's basic
assumptions to be compatible with the model you're envisioning,
don't worry - it's not as difficult as it may seem at first.
Migrating threads was a very fundamental change, and it affected
lots of code all across Mach, but still it was _way_ easier than
starting from scratch, and more useful too, because we now have
a kernel that supports migrating threads _and_ runs Unix. :-)
And in any case, as long as the changes you make are well
planned and thought out, I will be more than happy to incorporate
them back into the main Mach distribution so that lots of other
people can start banging on it and helping out with the grunt work.

				Bryan