Fault Containment Domains
Jonathan Shapiro
shap@viper.cis.upenn.edu
Fri, 9 Dec 94 15:11:47 -0500
Different districts could, and generally would,
have some knowledge of each other, but only through conventional Mach-like
non-persistent communication channels.
Given KeyKOS as it stands, there are no non-persistent communication
channels within the scope of a single system. KeyKOS simply doesn't
attempt to guarantee consistency across, say, a network or TTY
connection.
...you can get new cross-district keys on the next restart in the same
way that the persistent part of a KeyKOS system obtains new device keys
after a restart - by looking them up in a nameserver or something like that.
That's what I mean by "reconnect". KeyKOS already has to do it
at some levels in the system.
Here's how KeyKOS handles devices. Perhaps we can come up with a
clever way to extend it along the lines you are thinking.
In KeyKOS there are some very special keys called Device Creators.
One way to think about this is that somebody holds the authority to
create terminal port keys by holding the device creator key for the
terminal ports board.
In KeyKOS, most keys for individual devices are rescinded on startup.
This ensures, for example, that login sessions to terminals get
revalidated. On a PC, you might want to make console and keyboard
keys not get rescinded if your environment was physically secure -
this would allow the user to go right back to where they were.
Device creator keys are not rescindable in KeyKOS, and are therfore
closely held by very trusted components. In some sense, these
components act a bit like namespace managers.
If I understand your proposal, it sounds like one would need to define
something rather like a device creator key that did NOT get rescinded
by a domain restart. These keys would be the keys that allow one to
talk to name servers. I suppose the cross-boundary servers would need
to obey some design restrictions to ensure that they would work under
this assumption. I'm not sure how failure notifications would be
handled in such an environment.
Given this mechanism, I can see how one would build a recoverable UNIX
system that crossed containment domains, but I don't offhand see how
to make processes unaware of the boundary operate correctly without
some sort of transaction-oriented protocol between them.
>You also gain a more disciplined, uniform approach to resource naming
>and access, which helps alot. That's what I'm trying to retain in
>DIMSUM.
Well, you also get that in Mach, and in Amoeba, and in Chorus, and in
innumerable other capability-based OSes. Unless there's some fundamentally
different approach you're planning to take, it doesn't seem like a very
likely research direction.
The novelty in what I'm thinking about doesn't relate to being a
capability system per-se, thought that makes it easier to think about.
The novelty is in sharing some system objects that I don't think
anyone has tried to share before.
In Mach, Amoeba, or Chorus, can a two threads in the same address
space be running on different machines (as opposed to different
CPU's)? I suppose that when I say same address space, this can be
taken to mean that if one thread does an mprotect-type call on the
address space or maps an object it is seen by the others. Has this
been done before?
Jonathan