[cap-talk] Persistence as a category, danger, PLASH (was: as a cap value)

Jonathan S. Shapiro shap at eros-os.com
Thu Mar 13 13:24:35 EDT 2008


On Thu, 2008-03-13 at 09:50 -0700, Jed Donnelley wrote:
> It seems to me there is a category of capability systems
> where non-persistence makes perfect sense.  These are
> systems where capabilities are *only* used as vectors
> of authority in processes that will be destroyed (and
> not recovered) on a system restart.

I agree.

> For such systems
> (I include PLASH, Mach, and many others) there is no
> added value with capability persistence, in fact it's
> difficult to imagine why or how one would even try.

I do not agree.

In such systems, there is what might be called a "persistence boundary".
This most commonly appears in the architecture as a boundary line
between a capability-based runtime system and an ACL-based persistence
layer. While UNIX is an imperfect capability system in this regard, it
is an example of the type of design that I am describing.

The reason to consider capability persistence in such systems is this:
whatever the merits or weaknesses of ACLs or capabilities, a single
consistent permission system is invariably preferable to two mutually
incompatible permission systems. Consistency of semantics across the
system is greatly to be desired.

> One only needs to start considering persistence if
> there is some way to store capabilities between system
> restarts.

That is necessary but not sufficient. Storing the capabilities per se is
actually pretty easy. The challenging part is persisting the objects
*named* by those capabilities and the association between each
capability and its holder.

>   In systems with such persistent (or at
> least restart de coupled) capability storage I still
> think my argument that all capabilities should be
> persistent applies...

In any capability system of either type, it is necessary to be able to
explicitly sever capabilities. In a system having a mechanism to sever,
that mechanism is sufficient to sever capabilities on restart if
desired.

> One thing I will mention about these non-persistent
> (Port?) systems is that I don't believe they substantively
> solve any "danger" aspect of capabilities.  Even though
> capabilities may disappear on a system restart, that is
> no reason to assume that they are substantially less
> "dangerous".  Systems may not restart for a very long
> time (months, years?).

That is true, and it is becoming more true with time. For example, UNIX
has become less secure as it has become more reliable.

> ... or even I think any distributed
> capability system where the restarts of the
> systems are de coupled) I argue (as before) that
> they need to persist.

That is a subtle and challenging issue. The problem is that decoupled
recovery does not imply re-establishment of mutual consistency. I think
a strong case can be made that "a capability names an object" is
short-hand for "a capability names an object that is evolving through
time". The wielder of a capability predicates their use of that
capability on beliefs about the state of the target object. If the model
and the actual state come to deviate (as when recovering to
non-consistent mutual states), then the semantic contract of the
capability has been lost.

In this event, I think it is very *desirable* for such a capability to
be severed in order to signal that mutual non-consistency has arisen. It
may also be useful if this is some form of "recoverable sever" so that
validity can be re-established by the participants. One thing, however,
does seem clear: a sensible system having a mutual inconsistency
boundary of this form should NOT permit general operations on a
capability to transpire when the meaning of such operations demonstrably
cannot be known by the participants.

> In all systems I believe that for safety (avoided
> danger) capabilities that are issued to application
> for temporary use should be revoked when the
> application completes.

This is also subtle and challenging. It depends on whether this
revocation is transitive. In L4Sec, for example, where the mechanism of
capability transfer is MAP, revocation of a particular capability
location (which occurs either explicitly or through overwrite) has the
effect of revoking all of the locations whose current values were copied
from the location being revoked.

This type of transitive revocation raises some very challenging problems
of recovery when a sequence of mappings A->B->C occurs and B wishes to
exit before C does. It is a "semantics of durability" problem.

The L4 folks argue that some form of capability exchange protocol is
necessary at any boundary between distinct revocation domains in any
case, and they therefore argue that this issue is not significant in
practice. On the other hand, when challenged they could not produce a
sensible recovery protocol for certain page fault unmaps.

My sense is that the jury is still out on this issue, and it will be
interesting to see what emerges.


shap



More information about the cap-talk mailing list