Backing up one more step...

Bryan Ford baford@schirf.cs.utah.edu
Fri, 09 Dec 94 12:48:38 MST


>   KeyKOS ties district boundaries to machine boundaries; what I'm
>   proposing is that instead of trying to "make the whole world one
>   big KeyKOS system" or something like that, it would be much more
>   flexible and powerful (and practical) to keep the notion of
>   separate districts around, but untie them from machine boundaries:
>   allow a machine to support multiple districts at once, and allow
>   a district to be distributed across multiple machines at once.
>
>Actually, a KeyKOS kernel can do this. (I don't know if it was
>implemented - one of the other folks will be able to tell you).
>The KeyKOS checkpoint system can be set up to checkpoint to a remote
>hot standby.  The mechanism for standby startup was for the remote
>kernel to suddenly develop schizophrenia and run two machines on one
>pice of hardware.  Since the two machine images had no knowledge of
>each other (no keys crossing the districts), they could never become
>entangled. 

I never said there could be no keys crossing the districts; I said that
any such keys would be invalidated/rescinded/whatever if either of them
died and was restarted.  Different districts could, and generally would,
have some knowledge of each other, but only through conventional Mach-like
non-persistent communication channels.

>   Every key is a connection from its holder to the object it denotes.
>   If a process owns a key that points into another district, and either
>   district dies and gets restarted, the key becomes invalid just as if
>   the object it pointed to had been reclaimed.  I don't quite see the problem.
>
>The problem is that all cross-district keys would become invalid the
>first time a district failed.  Once you have *no* keys into any other
>district, it is impossible to obtain any.

No, you can get new cross-district keys on the next restart in the same
way that the persistent part of a KeyKOS system obtains new device keys
after a restart - by looking them up in a nameserver or something like that.
That's what I mean by "reconnect".  KeyKOS already has to do it
at some levels in the system.

>   So what have you gained beyond a bunch of separate KeyKOS machines
>   on a network? 
>
>Nothing, which is why I abandoned distributing KeyKOS. 

Don't give up so easily! :-)

>   Or for that matter, how is it better than a bunch
>   of Unix machines on a network, since any distributed application
>   wouldn't be able to take advantage of the reliability of each of
>   the individual hosts?
>
>Well, you gain about a factor of.... 75 in individual node
>reliability, which probably helps quite a bit in practice.  UNIX isn't
>very stable.

Sure, maybe KeyKOS is a better-written OS than Unix, and doesn't crash
as often.  But the point is, when one of the nodes _does_ crash (or somebody
trips over the power cord or whatever), while the individual node may
be able to rollback to checkpointed state, that doesn't help any
distributed applications that were running partly on that host -
they're still hosed, because the restarted host is now inconsistent
with everybody else.  So persistence in the individual host buys nothing
for those applications.

>You also gain a more disciplined, uniform approach to resource naming
>and access, which helps alot.  That's what I'm trying to retain in
>DIMSUM.

Well, you also get that in Mach, and in Amoeba, and in Chorus, and in
innumerable other capability-based OSes.  Unless there's some fundamentally
different approach you're planning to take, it doesn't seem like a very
likely research direction.

				Bryan