Re: CAOS - A CApable OS? [was: Process Authentication Groups (PAGs)] Jonathan S. Shapiro (jsshapiro@earthlink.net)
Sun, 22 Mar 1998 17:42:17 -0500

Oh, boy. This discussion of capabilities has clearly gone wide of the mark, starting with bad definitions of terms and proceeding downhill. There are a lot of good ideas within the discussion, and some complex suggestions for how to do some things that have relatively simple solutions once the definitions get cleared up.

Three apologies in advance to all involved:

For length: I'm going to try to be precise, and I'm sure this will

lead to further rounds.

For assumptions: I'm clearly coming in to the middle of a larger

discussion.

For confusions: At some points I'm unclear on who the speaker was,

so please bear with me if I misattributed or misunderstood.

Also, I request in the strongest possible terms that you do *not* confuse the term "capability" by misapplying it in the way CAOS appears to be doing. I'll talk about why below.

Finally, if somebody will tell me the newsgroup this started in, I'll post this to the newsgroup as well.

> > Jim Dennis wrote:
> >>> .. current uid dependant authentication mechanisms (and similar)
> >>> could be replaced in whole by a general 'capability access control
> >>> mechanism'...

This could mean two things:

  1. the UID approach could be replaced by a capability approach, yielding a more secure system
  2. the UID approach can be implemented on a capability substrate

Both statements are correct. The second is useful for reasons of backward compatibility. KeyKOS, for example, has implemented a fairly complete UNIX binary compatibility system on top of a pure capability substrate.

DEFINITIONS:

A 'capability' is an unforgeable token that designates (names) an object and conveys the right to perform a set of actions on the designated object. Possession of a capability by a process is a necessary *and sufficient* condition to perform the authorized actions on the designated object.

The question of how capabilities are obtained is a separate issue that was getting conflated in the discussion. I will address it in a moment.

Please note that the POSIX "capability" model is completely confused. They are using the term 'capability' incorrectly to describe the right to perform an action on *any* object. While restricting the operation set accessable to a process is a substantial improvement on the current UNIX model, it isn't a capability system and shouldn't be called such. Such restrictions are much closer in spirit to the VMS authorizations bitmap than they are to capabilities.

COMMENTS:

Taking into account correct definitions, the phrase:

	"a process ... must be allowed to use the 'open file'
         capability ... on the given object"

is nonsensical. "open file" is an action. In order to be a capability, both the action and the object must be identified.

Obtaining capabilities:

Capabilities can be obtained in two ways:

  1. A process P1 can transmit a capability C that it holds to a process P2, assuming that P1 holds a capability to P2.
  2. [deprecated] A process can make a request of some system agent to grant it certain capabilities on the basis of the program being run, a challenge-response protocol, or some such.

The correct way to think about this is that the process had an intrinsic capability to some other process that holds desired capabilities and acts as an authentication agent.

Such an authentication agent is necessary only when the system as a whole is not persistent -- the problem is how to reacquire capabilities when the system restarts. Authentication processes of this kind present some serious security design issues and render security analysis quite difficult.

> >>> For
> >>> example, for the file to be opened, the process must request the 'open
> >>> file' capability from the system, and the system can then evaluate if
> >>> the process meets the cryteria to be allowed to use the
> >>> capability...

It should now be clear that this statement is confused, on several grounds:

  1. There can be no such thing as an 'open file capability' -- it fails the definition.
  2. A process does not "request" a capability from the system. There are two problems in this notion:
  3. In a capability system, a process has no intrinsic authority to request anything from anyone. It can only invoke capabilities. Unless the process has some capability to some agent holding useful authorities, it has no way to ask for more capabilities.
  4. It begs the question of how the system might decide if the process is authorized to request that capability.
       A process either possesses a capability or it does not.  If a
       process P1 holds a capability to a process P2, it may ask P2
       for capabilities held by P2.  P2 may in fact turn out to be
       implemented by the operating system (P1 should not, in
       principle, be able to tell this).

       Simply possessing the capability to P2 is sufficient proof of
       authority to request things of P2.  P2 may then choose to
       implement additional requirements, ranging from
       challenge/response to ACLs to any number of other policies.

A better solution is to simply make the whole system persistent, at which point the need for an authority agent essentially goes away. The best way to keep hold of the capabilities that a process needs is to grant them from the beginning and ensure that they are never lost.

> >> No! That's the whole problem with ACL/UID based
> >> models -- the system must be omniscient and the
> >> process is "requesting" access on its own behalf.

I hope it's clear that the approach to an authentication agent that I outlined above, however undesirable, does not require omniscience.

> >> In a capabilities model there is some *other* process
> >> or mechanism that grants access to the system.

While a capability system can certainly include such an authentication agent, there is nothing in the model that requires one. The most successful capability system built to date -- KeyKOS -- did not have such an agent.

> >> This
> >> can be a Kereberos "ticket granting server" or a
> >> "reference monitor daemon" or it can be some "meta
> >> information" in a "resource fork" (filesystem based).

Several systems have been built using these mechanisms. All have proven to incorporate significantly compromised security. When the capabilities get put into a file system (for this purpose, the kerberos server may be thought of as a remote file system), the question must then be asked:

"So how did you get the authority to talk to the file system?"

The answer invariably boils down to "by fiat." Universal access to a shared file system means that all programs have channels to all other programs and are therefore unsecurable [the file system is itself a channel]. This is why universal persistence is a better answer. It also turns out to be a big performance win.

> >> It might even be the parent process.

This is closer. Here are some rules from the capability system perspective:

  1. The creating process can grant at most those capabilities that it possesses, and should be able to do so selectively.
  2. Once created, the new process should have no further access to its parent unless the parent granted it the authority (i.e. a capability) to do so.

Rule (2) implies that access to the parent is capability controlled, and parenthood therefore should not be used intrinsically as a way of determining capability access.

One tricky part about (1): If a process A calls a fabrication agent F to build a new process B, F is free to hand to B capabilities that are held by F but not by A.

This sort of collusion is entirely permissable, and actually fairly important. It provides the means for me (a programmer) to build a program that can access, say, the password database. I can then give you access to the password manipulator fabrication agent. Assume that both of us trust the fabricator agent by virtue of the fabricator being a system-provided utility, as in the KeyKOS factory or the EROS constructor.

The EROS web site, by the way, can be found at:

http://www.cis.upenn.edu/~eros

There is a *lot* of documentation there.

In any case, the fabricator gets the capability to the password database from the developer. Whenever it fabricates a new password program instance, it copies this capability into the password program. The client of the password program has access to the password database only by way of the password program, which mediates things.

> >> In order to prevent hostile or subverted code from
> >> requesting access to resources beyond the intentions
> >> of the administrator...

In many situations where we are concerned about security, the administrator is not cleared to know the authorities held by certain programs. Appeals to the administrator therefore break down with dismaying speed.

Consider, for example, a "public access" system. I've contracted for a certain amount of space and CPU. The administrator should not be authorized to learn what I do with that. The recent lawsuit against the Australian service provided could not have won if it was demonstrably impossible for the service provider to censor their users.

> > In the model I had in mind, the parent is the one that allows or denies
> > its child a specific capability (the capabilities can be thought of as
> > individual functions)....

With the addition of the fabrication example above, this works fine, and is essentially how KeyKOS and EROS work.

> > As far as a child is concerned, it cannot count on being able
> > to do _anything_ at all (not even execute a single instruction of its
> > code) and it doesn't inherit _anything_ at all from the parent. 

Doing good so far...

> > If it wants to do so, it must explicitly request it and the parent must > > explicitly grant it.

In a capability system, the child has no authority to ask any such thing of the parent unless the parent granted the child a capability to the parent. Assume instead that the parent receives an initial capability to the child, and can then elect to send it capabilities (or not). This is simpler than the mechanism you propose, and equally good.

Note also that once created, the parent/child relationship is NOT the hierarchical relationship assumed by UNIX. The relationships are defined entirely by who holds what capabilities, and are therefore a potentially arbitrary graph.

> > the parent ... may at any time for any reason terminate access to > > the capability ...

In a capability system it cannot. The child either holds a capability or it doesn't. There is nothing in the capability that records where the capability came from. Once transferred, the sender has no control over how the receiver uses the capability.

Observation: while you could build a modified capability system that used hierarchically constrained access rights, you would create to problems in doing so:

  1. Undesired communication channels inherent in the access control hierarchy itself.
  2. Variable-length capabilities.

>From an implementation perspective, capabilities *really* want to be fixed size. Imagine programming a system with variable length pointers...

> 	This sound more like the "virtual subkernel" concept that
> 	I've discussed with others several times.

I agree that this is closer to what the previous author was describing.

>
> 	The problem with doing it at a resolution finer than 
> 	the system call level is that the performance starts to 
> 	suffer unacceptably.

It depends entirely on the speed of the invocation mechanism. What you are really asking is: "How small can the reasonable granularity of a protected invocation be?" The faster the boundary-crossing logic is, the finer the operation that a protected invocation can protect while retaining acceptable performance. Liedtke (L3, L4) has a nice slide of this. He says:

Suppose you are willing to devote 2% (or X%) of your system resources to protection boundary crossings. Here is a graph showing the number of crossings you can do as a function of crossing costs.

The graph shows lines for several choices of X, several crossing speeds, and several numbers of crossings.

> 	Another approach is to have the processes all running
> 	in "virtual machines" (a la Java) -- and allowing the 
> 	"parent" (or some other specified "nanny" process) arbitrate
> 	each access (of each type) to each resource.  
> 
> 	The principle problem with this approach is that existing
> 	software would have to be ported to the VM.

Painting the porting problem as black and white is too strong. For the Java VM you need to port because it provides a different processor architecture. For KeyKOS and EROS, many UNIX programs will run unmodified within a UNIX emulator. A few important programs will be revised to know more about the underlying security model. This is because the KeyKOS/EROS "virtual" machine incorporates the user-mode instruction set of the underlying processor architecture.

> 	I don't see much opportunity in these techniques to 
> 	substantially improve security at the OS level.  You 
> 	still have the same problems of "subversion"....

I'm a bit unclear on this. The only "subversion" problem I see in KeyKOS, EROS, Java, or E is that the security kernel can be modified by an unscrupulous user who has sufficient authority on the machine in question. The operative words, however, were "sufficient authority on the machine in question." If the opponent is in a position to reload the operating system you're pretty thoroughly screwed no matter what system you are running.

The OS (i.e. supervisor) level needs to be trusted. The question is: "What new kinds of useful policies and controls can now be implemented by application level code?" In a capability system, a surprising number of policies are entirely exportable from the kernel.

> > As you may have already noticed, I use a very broad definition of a
> > 'capability' and it probably differs from what it means on other
> > 'capability oriented' operating systems. The capability in CAOS may be
> > thought of as any function...

Actually, it's worse than that. Your function-oriented model isn't really secure. The problem is that functions are far too powerful. In order to say anything with confidence about what authority is granted by providing access to a particular function, you need to be able to say quite a lot about the environment in which that function runs.

True capabilities can be thought of as functions that have 'closed over' (in the scheme sense) their first argument (i.e. the object). What isn't often stated explicitly is that the choice of actions for the primitive capabilities (the functions) must be tightly constrained to result in a securable system. We did a paper that, among other things, formally described some of the issues in implementing a particular security policy on a capability system:

http://www.cis.upenn.edu/~shap/EROS/popl98.300dpi.ps

> 	We shouldn't try to refer to this sort of thing (active
> 	process monitoring by "parent" or other processes) as
> 	``capabilities'' since that will serve to confuse some
> 	and irritate others.

Misusing the term will have several undesirable consequences:

  1. It will lead others to misunderstand what you have done.
  2. It will lead *you* to misread the existing literature on capabilities and therefore misunderstand it.
  3. It will discredit something that works.

By giving something fundamentally impossible to secure (for reasons described in that paper) the name "capability", you will lead a lot of ignorant people to ignore the *real* capabilities, which *do* work. To say that it will "irritate" the people who work on capability systems is a bit of an understatement.

I really like some of the ideas you are considering. If they are combined with a true capability substrate I think they can be made secure.

A personal plea:

I have spent the past 7 years building a secure capability system. Your function-transfer security approach is completely different. For this reason it should not be called 'capabilities'. any OS-oriented dictionary will make it obvious that it isn't capabilities. Surely it's better all around to chooose a new term for a different idea?

I ask that you not jeopardize my work and the work of all of the other people in the capabilities field by misusing the name.

> 	As I understand it the distinction between a ``capability''
> 	and an ACE (access control entry) is that a capability is
> 	"specific and *sufficient*"  for each form of access 
> 	(read, write, execute, append, stat, etc) to each 
> 	resource (file, TCP port, "privileged" system call, socket, 
> 	memory block, etc) there is a single ``capability''.  

Mostly correct. Actually, there can be multiple capabilities for the same object. Two capabilities are the same if they designate the same object and authorize the same operations. Two capabilities can also authorize distinct operations (e.g. read-only vs. read-write capabilities to the same page of memory).

Actually, this means that holding the right collection of capabilities subsumes the function-oriented approach that CAOS is considering. The "authorized actions" are equivalent to CAOS functions. A process then needs to have access to capabilities for all of the objects it is allowed to do those operations on.

> 	*Any* process with "possession" of that ``capability'' can 
> 	gain that form of access to that resource -- there are no 
> 	other "checks" to be performed.  That is the simplicity 
> 	of them.

Correct. Also the source of their high performance. Under the covers, for example, the UNIX runtime core is in some places a capability system. A "file descriptor" is actually a capability with some gunk welded onto the side. The reason that access lists are not consulted for each read and write is that they are too inefficient.

> 	Here's also where you can complicate issues a bit.
> 	If you have capabilities *on* other capabilities you 
> 	can require one capability to "execute" another.  This
> 	allows you to have "revocable" capabilities.  

It does, but it's completely unnecessary.

There are two kinds of revocation to consider:

  1. Revoke ALL access to object X (your mechanism won't help).
  2. Revoke a particular capability to object X.

In KeyKOS/EROS, the first operation is called 'rescind.' It requires fairly primitive level support.

The latter is best accomplished by means of a "forwarding" object (best because it's simple and because the forwarding object has other uses while the secondary capability does not). You can then use type (1) on the indirection object to accomplish type (2). Please note that type (2) is a rare case, and it's utility drops significantly in a capability system because capabilities can be transferred [which means you don't know who you are really revoking].

Note that requiring a second capability adds no marginal security. Once a party holds both capabilities they can transmit them to others as a pair. The security of capabilities lies in their unforgeability.

> 	Let's make up an example:
> 
> 		I want to have something like 'finger' and 
> 		give it the ``capability'' to read or execute
> 		a file (analogous to my .plan file).

Presumably you meant "... and ONLY my .plan file".

See my description of password file access above as an example of how to accomplish this with no additional capability types.

> 		(thus I'm granted all "public" capabilities
> 		merely be logging in).  It might also be accomplished
> 		by "binding" the "append" capability to a small program
> 		(like 'chfn') and "publishing" the "execute"
> 		capability to that.

The existence of a directory of public capabilities is quite dangerous, and must be handled with care. Suppose I publish an object that is the write end of a pipe. I now give you a trojan horse program that obtains this capability from the public directory. Unknown to you, it copies your data to me.

A simpler solution to the ".plan" problem is as follows:

Imagine each capability has an "ID" field. This is simply a number that is passed to the recipient process whenever a capability to that process is invoked.

There are two capabilities to the finger directory: insert and replace.

The read capability allows the operation

fetch(name) -> capability.

The replace capability allows the operation

replace(name, old capability, new capability)

All users hold a copy of the replace capability and a capability to their current .plan file. The finger daemon only holds the read capability.

If used carefully, this mechanism provides all of the access control guarantees that yours does.

As an aside, however, I'ld suggest that the .plan file really wants to be considered non-optional. What the user requires is the authority to modify it (which implies the ability to empty it).

> 	Any process that I 
> 	entrust with a given capability can use it, or give it
> 	to agents *other than* by intended recipient.  However it
> 	is possible to provide mechanisms to prevent that.  

No, it is not. If you give a capability C to any turing-complete collaborator process P1, the collaborator P1 can collude with a third party P2 to invoke the capability on behalf of P2.

The 'do not copy' bit, or the capability splitting mechanism you propose, simply do not add any security.

> 	That is capabilities as I think I understand them.
> 	I still don't know quite how you'd achieve some
> 	forms of control (such as the 'chinese wall'
> 	or variations of the "Clark-Wilson triples").

The fabricator provides something akin to a Chinese Wall. The idea can be extended. If someone will describe "Clark-Wilson Triples" to me I will describe how to provide them in the pure model.

> 	EROS and KeyKOS both require a form of process state
> 	"persistence."  This apparently obviates the need
> 	for "devine" intervention ('root') to solve the 
> 	"chicken and egg" problem posed by "shutdown" and
> 	"rebooting."  I have no first hand experience of 
> 	either of these systems so here my image is *really
> 	fuzzy*.

Eliminates the need for *divine* intervention too. :-)

I'll be happy to answer questions about either system. I'm actually close to releasing EROS -- just getting the net stack and the web server running.

> 	After e-mail with Jonathan and conversations with
> 	Hugh I'm convinced that "persistence of process state"
> 	is required in a "pure capabilities" model.

Not really. The problem is that the reconstruction of the security model on restart requires special case handling. It is probably possible to build a secure system without per-process persistence. We haven't done it, because the prospect of doing the security analysis involved was too horrendous to contemplate.

We then realized that persistence gives better performance than conventional file systems. At that point the incentive to build the hard system pretty much evaporated. Between the analysis problem and the improved performance, persistence seemed the obvious win.

> 	If I can "shut the machine down" and "bring it 
> 	up single-user" (create a discontinuity in the
> 	state of the processes) than I can go in a 
> 	'steal' (or modify) the state and I'll be 'root'

If you have physical access to the machine, there's all sorts of shit you can do regardless of model. Effectively, you have a meta-capability or a meta-ACL.

> 	(Despite DEC's and MS' protestations to the 
> 	contrary VMS and NT have an omnipotent account.
> 	It is the "backup" -- actually the "restore" operator!)
>
> 	I don't have time to think about how a "backup/restore"
> 	subsystem would work under a pure capabilities system.

Some facility is needed for archival storage. Archival requires that capabilities and ACLs be able to be serialized and deserialized. Such software must be able to reconstruct the result, and is therefore effectively omnipotent and universally trusted. The real issue is not whether the *program* is trusted, but whether the archival media is proof from forgery.

I do not know how KeyKOS handled this problem.

The EROS backup utility will maintain a copy of each unique capability that it writes to tape, and will use a cryptographic signature of some sort to ensure that the content of the tape is trustworthy when reading it back in.

> >> These both assume that you can create a specific
> >> list of resources prior to execution of the program.
> >> It would be very important (so far as I'm concerned)
> >> to allow multiple differing sets of capabilities for
> >> a given program.

I think you mean "to allow different *instances* of a program to hold distinct sets of capabilities?" That is certainly important.

> 	Note --- failing this (having prior knowlege of
> 	the precise forms of access required of each resource)

Can you give a case in which prior knowledge of the access required is not possible?

I think we can achieve what you need in EROS without a secure attention mechanism of the type that you describe. Secure attention, in any case, isn't the right name for what you are talking about.

> >> In addition the "pure" capabilities subsystem it should
> >> be possible for users to delegate access to specific
> >> files and programs without undo risk to their other
> >> files.

Umm, that *is* the pure capabilities model.

> 	I notice that your discussion mentions features to
> 	restrict access to specific times of day...
>
> 	Since capabilities only grant access (they don't "deny it"
> 	which I guess would have to be called a ``disability'')
> 	it isn't obvious how they can be used to provide the
> 	desired level of control.

In a pure capability system, you can grant access to a forwarding object, and then change what the object forwards to. A restriction agent of the type you describe is such a forwarding object.

The forwarding object does not need to be passive. Instead of giving me access to the printer, give me access to a process that does the printer protocol, but only between 9 and 5. If the time check passes, the process actually passes my requests through to the real printer. If a capability system is properly designed there should be no way for me to tell (programmatically) that I am not speaking to the real printer.

Enough for now. I hope that some of this is useful, and I'll be happy to answer followups.

Jonathan Shapiro