Capability Pages

 

This note describes the initial design for EROS capability pages.

Capability pages will not be implemented, for the following reasons:

  • They expose the size of the kernel implementation of a capability. We prefer not to expose this, so the implementation can be changed without affecting code outside the kernel.
  • The simple implementation of capability pages has them mapped into the process's address space, but with access permitted only in kernel mode. The simple implementation of access checking for strings passed in key invocations is to check that the string is mapped and accessible. Since this check is done in the kernel, it would be possible to send or receive a string in a capability page, which is a no-no.

1. Basic Idea

Certain EROS processes want to manage a large number of capabilities. The base EROS system provides no mechanism to do so efficiently. While we could adopt the KeyKOS approach and implement a supernode, this is fairly expensive on machines with a slow privilege crossing mechanism.

    It should be noted that the x86 privilege crossing delay is a disaster, and the KeyKOS supernode approach is entirely appropriate for machines with lesser delays of this form (e.g. most modern RISC machines). The mechanism proposed here would remain faster on such machines, but only by a factor of 4 or so rather than a factor of 400. given a machine with a

In addition to the performance issue, it is convenient to be able to share a pool of such capabilities in the same way that one shares memory.

Finally, it would in some instances be very convenient to have a capability stack that it parallel to the data stack.

In an ideal world, capabilities would be a hardware-protected first class data type (requiring a side bit in the memory). The proposed mechanism comes as close to realizing this as can be practically accomplished on current generation hardware.

2. Capability Pages

Capability pages provide a page-sized equivalent to a node. A capability in 32-bit EROS systems occupies 16 bytes. The number of such capabilities in a page is architecture dependent, but this design assumes that capabilities will be densely packed within a page.

Capability pages are addressed in the same way as physical pages: by inserting them into segments.

Because capability pages can be shared via segments, the KeyKOS ``sensory'' notion must be replaced in EROS by a combination of ``weak'' and ``read-only'' attributes. A ``weak'' capability is one that applies the sense transformation to all capabilities fetched through it. A read-only capability prohibits writing. The combination of these two constraints is functionally equivalent to the KeyKOS sensory capability.

Both attributes are interpreted hierarchically in a memory context, which has the desired effect on controlling exposure across an entire memory segment.

2.1 Page Types and Access Types

In the proposed design, a page frame holds either capabilities or data, but never both. Data references to a capability page cause suitable access violations. Capability references to a data page similarly cause access violations.

The question of what to do about such faults remains a policy decision to be made by a keeper. This requires that the kernel now report to the keeper some encoding of the following information:

  • Whether a page exists at the referenced address.
  • What type of page exists at the referenced address (capability or data).
  • What type of access was performed to that address (capability or data).

To handle this, we revise the existing segment fault codes to the following:

FC_CapInvalidAddr

A capability reference was made to a location where no valid page (of any kind) exists.

FC_DataInvalidAddr

A capability reference was made to a location where no valid page (of any kind) exists.

FC_CapTypeError

A capability operation was performed to a data page.

FC_DataTypeError

A data operation was performed to a capability page.

FC_CapAccess

A capability write was performed to a capability page along a mapping that is read-only.

FC_DataAccess

A data write was performed to a capability page along a mapping that is read-only.

For reporting purposes, type errors have higher precedence than access violations. A data write to a read-only capability therefore generates FC_DataTypeError rather than FC_DataAccess.

2.2 Implementation

Capability pages require that we extend the existing ``type tag'' for disk pages from {page, log frame, node frame, alloc pot} to {data page, capability page, log frame, node frame, alloc pot}, and keep a type tag on pages in memory so that we can keep track of what they hold. The necessary tag fields are already present; tracking the tag has no marginal data overhead.

Capability manipulations are performed by four kernel-emulated instructions:

copycap %cr1,%cr2

Copies a capability from capability register %cr2 to capability register %cr1. Has no effect if %cr1 is capability register zero.

Because this is an indivisible instruction generating no exceptions, it can be special-case coded in the kernel.

xchgcap %cr1,%cr2

Exchanges the values of %cr1 and register %cr2. If either register is capability register zero, it's value remains unchanged.

Because this is an indivisible instruction generating no exceptions, it can be special-case coded in the kernel.

cload %cr1, address

Loads the capability at address into capability register %cr1.

Exceptions:

FC_CapInvalidAddr

address has no mapping in the domain's address space.

FC_CapTypeError

address names a data page in the domain's address space.

cstore %cr1, address

Stores the capability in capability register %cr1 at address.

Exceptions:

FC_CapInvalidAddr

address has no mapping in the domain's address space.

FC_CapTypeError

address names a data page in the domain's address space.

FC_CapAccess

address names a read-only capability page in the domain's address space.

It may prove useful in the future to introduce a related pseudo instruction cprobe address, which initiates a page fault but does not block the calling process.

It may also prove useful in the future to extend the current IPC invocation instructions with an instruction that permits the invoked capability to reside in memory.

3. Other Proposals

Mark Miller has proposed an alternative to the above instructions based on a memory-memory architecture rather than a register-memory architecture.

In this proposal, a data load from a capability page returns the referenced address. A data store of a value to a capability page interprets the value as a source capability address, and copies the capability at that address to the referenced address.

This proposal has the advantage that it allows capability references to be treated as first-class data structures by the compiler with approximately the right semantics.

I have decided not to pursue this approach for now for two reasons:

  • I am not convinced that the memory-memory architecture is what we want.

  • Without careful thought, I am reluctant to do this by overloading the data instructions. I am mildly concerned about the loss of fault reporting.

I'm recording the proposal here because it is interesting, and because it might significantly simplify the implementation of the capability cache logic.

4. Issues in Implementation

The following issues have arisen in the course of implementing this change.

There is a difficulty with allowing prepared keys in capability pages. The nature of the problem is that the page may be involved in I/O, and the I/O might therefore cause the prepared form of the key to be written to disk. There are two ways to solve this:

  1. Do not allow prepared keys within capability pages

  2. Deprepare all keys in a capability page before performing I/O or page eviction. Before preparing any key in a capability page, check for the I/O case (in which case the page is guaranteed to be dirty) and treat the page as requiring copy on write if a key must be prepared within it.

The latter solution becomes utterly necessary if we allow the invoked key to be specified by a memory address.

An ancillary performance issue arises, which is that unlike nodes, the keys in capability pages must be forcibly deprepared at every checkpoint, because there are no capability page pot equivalents.


Copyright 1998, 2001 by Jonathan Shapiro. All rights reserved. For terms of redistribution, see the GNU General Public License