Comparing ACLs and Capabilities

 

Two topics come up repeatedly on the EROS mailing lists:

  1. How are capabilities and access control lists different?

  2. Is one better than the other? If so, why?

They come up so frequently because people use these questions as a way to sharpen their understanding of the issues.

One thing that I learned in my own attempts to answer these questions is that the arguments are actually quite complex, and that the first mistake is to oversimplify them when you are trying to get a handle on them. You can simplify the explanation later; first be sure what it is you are trying to explain.

Since it's very easy to miss important details, this note tries to give my current answer to the first question. It describes how capabilities and access control lists actually work in practice, and therefore how they differ. It may tell you more then you feel you wish to know, but it is as accurate as I can make it without appealing to mathematics.

I clearly have an opinion on the second question, or I wouldn't have built EROS. This note tries not to be partisan, because it is better to understand the basis of the discussion before debating the merits of the outcome.

1. Access Control Lists

An ACL system has at least five namespaces whose relationships need to be considered:

  1. The namespace of file names: /tmp/foo

  2. The namespace of unique object identifiers: (dev 22, inode 36, type file)

  3. The namespace of user identities (uid 52476)

  4. For each object type (file, disk, terminal, ...), the namespace of operations that object can perform.

  5. The namespace of process identifiers (process 719)

In an access list system, it is assumed that there are two global mappings:

    principal: process identity -> user identity
    fs_lookup: file name -> object identity

That is, every process has an assigned user identity and every file name can be translated into a unique object identifier. Hanging off of every unique object is a further mapping:

    acl: (object identity, user identity) -> operation(s)

Given a process proc that wishes to perform an operation op on an object object, the protection mechanism in an access list system is to test the following predicate:

    op in acl(object, principal(p))

In the special case of the "open" call, this test is modified to be:

    op in acl(fs_lookup(filename), principal(p))

2. Capability Systems

A capability system has at least four namespaces whose relationships need to be considered:

  1. The namespace of unique object identifiers: (dev 22, inode 36, type file)

  2. For each object type (file, disk, terminal, ...), the namespace of operations that object can perform.

  3. The namespace of process identifiers (process 719)

  4. The namespace of capabilities (object 10, operation set S)

In a capability system, it is assumed that there is one local mapping for each process

    cap: (process identity, index) -> capability

That is, every process has a list of capabilities. Each capability names an object and also names a set of legal operations on that object.

There are also two "accessor" functions:

    obj: capability -> object identity
    ops: capability -> operations

Given a process proc that wishes to perform an operation op on an object object, the process must first possess a capability naming that object. That is, it must possess a capability at some index i such that

    object == obj(caps(p,i))

To perform an operation, the process names the "index" i of that capability to be invoked from the per-process list. The protection mechanism in a capability system is to test the following predicate:

    op in ops(caps(p,i))

Capability systems typically do not have a distinguished "open" call.

3. Some Differences

This section is incomplete.

Simply comparing the predicates shows that there is a significant difference between the two systems:

    ACL: op in acl(object, principal(p))
    Capability: op in ops(caps(p,i))

An obvious difference is that the capability model makes no reference to any notion of "principal".

Another obvious difference is that the capability model has a parameter "i". This allows the process to specify which authority it wants to exercise, which is why only the capability model can solve the confused deputy problem.

Access Rights and Persistence

What happens when the computer shuts down and all of the processes disappear?

In an access control list system, this is not a problem, because the login sessions disappear too. The user identity for a process is derived from who starts it, which is in turn derived from the login session. There is no need to record permissions on a per-process basis.

In a capability system, there is a definite problem. Solutions vary. Some systems provide a means to "pickle" a process or associate an initial capability list with each login. EROS makes all processes persistent.

Least Privilege

Capability systems allow a finer grain of protection. Each process has an exactly specified set of access rights. In contrast, access control list systems are coarser: every process executed by "fred" has the same rights. If you could always trust your programs, the coarser protection is fine. In an era where computer viruses are front page news, it is clearly necessary to be able to run some programs with finer restrictions than others.

Revocation

In an access control list, you can remove a user from the list, and that user can no longer gain access to the object. In a capability system, there is no equivalent operation. This is (arguably) a problem. Users come and go on projects, and you'ld like to be able to remove them when they should no longer have access to the project data. There are mechanisms to manage this in capability systems, but they are cumbersome.

Rights Transfer

In general, an access control list does not (in theory) allow rights transfer. If "fred" obtains access to an object, he cannot give this access to "mary" by transferring the object descriptor after the object has been opened. I say "in theory" because fred can still proxy for mary.

In a capability system, capabilities can be transferred. If a process running on behalf of fred can speak to a process running on behalf of mary, then capabilities may be transferred between the two processes. This can be useful: it allows you to hand a capability to a particular document to an editor. It can also be dangerous: if the program you are running is malicious, it can leak your authority.

4. The Equivalence Fallacy

There is an old claim that started appearing very early in papers on protection. The claim is:

    Capabilities and access control lists are equivalent. They are simply two different ways to look at a Lampson access matrix. Any protection state that can be captured with one can be captured with the other.

People who have heard of capabilities almost universally believe this claim. Unfortunately, the claim is untrue. Worse, it obscures understanding of protection.

By way of debunking it, let me first explain what is meant by this statement. Then let's look at why it is incorrect.

The Lampson Access Matrix

The Lampson access matrix provides a way to look at the protection state of a system. It describes the access rights of the system at some instant in time. Each subject (a subject is always a process) in the system has a row in the table and each object (which can be either a process or an object) has a column. The entries in the table describe what access rights subject S has on object O:

O1 O2 O3
S1 r r,w x
S2 r r,w

The idea behind the claim is that if you look at a row of the access matrix, you are looking at a capability, and if you look at a column of the access matrix, you are looking at an access control list entry:

O1 O2 O3 O1 O2 O3
S1 r r,w x S1 r r,w x
S2 r r,w S2 r r,w

Unfortunately, this is wrong.

A Problem of Terminology

In the early security literature there was some sloppy use of the term "subject." In some papers the term "subject" was used to mean "process" while in others it was used to mean "principal" (i.e. a user). If we take subject to mean "principal", then the red column is not a capability; capabilities do not have anything to do with principals. If we take subject to mean "process" then the cyan row is not an access control list. ACLs do not refer to processes.

I have pointed this out to theorists who work on formal verification, and I have seen some good ones wave their hands and say "That's not a problem -- just expand all the processes, discard the notion of user, and it all works just fine, and the two models both fit in the matrix."

This is true in some mathematical sense. The problem is that after you do this you haven't got access control lists any more. Access control lists are specified in terms of users, not processes. One can argue (and I do) that specifying things in terms of processes is the right thing to do, but once you do this expansion you have lost a level of indirection that was crucial in understanding how access control lists work. There are useful properties you can prove about a system with all processes expanded that are not true if the user identity indirection is retained. It all depends on what operations are permitted, which brings me to the second, more serious problem:

A Problem of Evolution

While the terminology problem is fatal, there is a more subtle and more damaging error in the claim: it is a static view of a dynamic system.

If you freeze a computer system in mid-execution, you can draw an access matrix that describes it's current protection state, and you can ask if that state is okay. This is a very useful thing to be able to do. In practice, however, we aren't so much interested in what the instantaneous state of a system is as in how that state can evolve.

At a very high level of abstraction, proofs about security mechanisms all work the same way:

  1. First you define what a "safe" state is. That is, you specify what it means for the policy to be enforced.

  2. Second you establish an initial, "safe" condition. This is done by setting up the initial access matrix that you want. Since you are in control of how the access matrix is initialized, you should be able to establish just about any condition you would like.

    The operative word is "should". In some protection systems, however, there are constraints on what can legally be placed in the matrix. A classical access control list system, for example, requires that every process owned by the same subject must have the same permissions. This means that if P1 and P2 are both owned by S1, their rows must be identical.

    Unfortunately, it turns out that this is a fairly damaging restriction. It can make setting up the desired initial conditions extremely difficult (sometimes impossible), and it can make the verification of security policies mathematically undecidable -- which means you can't prove the security policy.

  3. Third, you specify what the rules are for how the system operates. What are the steps that the machine will perform? How does each step affect the access matrix (if at all)?

  4. Finally, you prove that if you start from the specified initial condition, and you take a sequence of steps, you always end up in a "safe" state. Actually, you have to show that this is true for all possible sequences of steps.


Copyright 2000 by Jonathan Shapiro. All rights reserved. For terms of redistribution, see the GNU General Public License