Fault Containment Domains
Bryan Ford
baford@schirf.cs.utah.edu
Sat, 10 Dec 94 16:37:04 MST
> ...you can get new cross-district keys on the next restart in the same
> way that the persistent part of a KeyKOS system obtains new device keys
> after a restart - by looking them up in a nameserver or something like that
>.
> That's what I mean by "reconnect". KeyKOS already has to do it
> at some levels in the system.
>
>Here's how KeyKOS handles devices. Perhaps we can come up with a
>clever way to extend it along the lines you are thinking.
Bill Franz explained very well what I have in mind:
when a district starts or restarts, it has no connections to the
"outside" world except a single "magic" cross-domain key, pointing
to the equivalent of the KeyKOS device allocator. But instead of
the device allocator being some fixed, low-level, highly privileged
key, it might point to something higher-level, such as the root of
a traditional Unix-like file system. The district's restart logic
could then reconnect other cross-district keys through nameserver
lookups based on that first key.
>Given this mechanism, I can see how one would build a recoverable UNIX
>system that crossed containment domains, but I don't offhand see how
>to make processes unaware of the boundary operate correctly without
>some sort of transaction-oriented protocol between them.
Communication across districts would be essentially the same, in terms
of failure semantics, as communication through TCP/IP sockets between
two conventional Unix machines: processes would _not_ be unaware of
the boundary. Anyone holding a cross-district key must be prepared
to handle connection failures.
As I've said before, though, proxies could be used to implement
transparent reconnection. The simplest proxies would probably give
you at-least-once semantics for requests on the cross-district key.
In cases in which exactly-once semantics are needed,
smarter proxies could implement some kind of transaction-based
protocol in order to provide that. But a large number of applications
work just fine with weak failure semantics: the Unix world has been
living with it for years. :-)
>The novelty in what I'm thinking about doesn't relate to being a
>capability system per-se, thought that makes it easier to think about.
>The novelty is in sharing some system objects that I don't think
>anyone has tried to share before.
>
>In Mach, Amoeba, or Chorus, can a two threads in the same address
>space be running on different machines (as opposed to different
>CPU's)? I suppose that when I say same address space, this can be
>taken to mean that if one thread does an mprotect-type call on the
>address space or maps an object it is seen by the others. Has this
>been done before?
OK, now I think I see where you're going, and it's a good direction.
Traditional Mach, at any rate, confines a particular task to a
single machine, which means in turn that address spaces, protection
domains, IPC spaces, receive rights, and a bunch of other things are
confined to being on one machine at a time. All of these objects can
be _accessed_ transparently from remote nodes through NORMA IPC,
and can even be migrated from one machine to another, but they can't
exist "on" multiple machines simultaneously.
I think most other distributed OS's are the same way.
Note that the SAS crowd has brought this problem up, because if
you're going to distribute a SAS system you obviously have to have
a single address space shared across many hosts. But I don't
remember seeing any terriffic results in this regard yet.
Anyway, figuring out how to solve this problem, allowing _all_
the basic system abstractions to "exist" on multiple hosts
at once in a reasonably performant way, would be a very good
research direction. I've thought about it quite a bit myself,
although I have no immediate plans for work in that direction.
In fact, I think the implementation of such a system could
greatly benefit from the concept of "districts", whether or not
they happen to be persistent. If you want, I'll go over my
ideas in this area, but first I'd like to know if this is
indeed the problem you're trying to solve, and what your current
approach is.
Maybe we're on the road again... :-)
Bryan