Fault Containment Domains
Jonathan Shapiro
shap@viper.cis.upenn.edu
Mon, 12 Dec 94 12:22:07 -0500
figuring out how to solve this problem, allowing _all_
the basic system abstractions to "exist" on multiple hosts
at once in a reasonably performant way, would be a very good
research direction...
This is indeed the problem I am looking at. I'm a bit hesitant,
because it's a big challenge to implement a new OS, and at the moment
I'ld be the only person working on it. An interesting question that
I'll have to think about is whether this could be done incrementally
based on some existing system (e.g. Mach) substrate.
At the moment, there is at least one object that I don't know how to
distribute, which is the object conveying authority to run (at the
moment I'm tentatively calling a schedule). Sending a schedule object
a message remotely isn't a problem, but relocating it doesn't have an
obvious meaning in general. It's contract is predicated on being
attached to a particular piece of hardware (the processor). Actually,
there is a categorial difference between a real-time schedule and a
scheduling class in this regard. Scheduling classes are fairly
transferrable, while hard schedules are not.
In fact, I think the implementation of such a system could greatly
benefit from the concept of "districts", whether or not they happen
to be persistent. If you want, I'll go over my ideas in this area,
but first I'd like to know if this is indeed the problem you're
trying to solve, and what your current approach is.
I haven't given much thought to districts in this context. I've given
some thought to virtual machine construction, and I think I see how to
do that okay. I've given a very small amount of thought to fault
containment, and that doesn't look too bad. Restart and recovery
looks quite a bit more complicated. I'ld very much like to hear your
thoughts on districts in this context, but I'ld feel more comfortable
if I can get my own thoughts on paper first to reduce ideon contagion
(ideon: an atomic unit of thought, by analogy to electron).
Bryan: I have finally been able to get a BSD system running on my
machines at home. FreeBSD 2.0 appears to have solved the ESDI
compatbility problems, which would seem to imply that I should be able
to get lites running without doing an extensive porting effort first.
I'll try it soon.
In spite of the impression I apparently created, I really *do* want to
examine Mach and better understand how it handles things like memory
objects first hand. That was why I was trying to get a mach system
running in the first place...
Jonathan