[cap-talk] Another "core" principle - virtualizing memory
Jonathan S. Shapiro
shap at eros-os.com
Tue Jan 2 10:31:15 CST 2007
Jed wrote the following privately, but I want to respond publicly:
> ... you seem to feel that an object-capability
> operating system can't be built with effective virtual memory
> support that also has fully virtualizable (wrappable) objects.
> That is, where all (and I assume there are relatively few) the
> kernel supported objects can be simulated by the extension
In case anybody else believes this, this is NOT what I have said. What I
have said is that pulling off virtualization correctly for memory
objects is much trickier than the DSM-style technique of playing with
the live mappings to ensure mutual exclusion.
So back to that:
On Tue, 2007-01-02 at 00:16 -0800, Jed Donnelley wrote:
> >You have reduced the problem to a previously unsolved problem, which is
> >the construction of "files, segments, or whatever". As I stated in my
> >earlier mail, each of these is an example of an address space. So your
> >answer amounts to "you build an address space by mapping address
> No. Just because a data segment has an address space (0...n)
> doesn't mean that it can't map into a separate process address
[Late edit: check the OH WAIT at the bottom of the following list; we
may have another simple disconnect.]
Before going any further, let me state three claims. If you disagree
with these then we have a much more fundamental disagreement and we may
want to set the mapping discussion aside temporarily.
Claim 1: A robust kernel must operate from fixed resource, ignoring
the fact that it may make startup-time decisions about how to partition
the available memory into individual resource types (what David Wagner
and I have referred to elsewhere as type-specific heaps, and David
Hopwood has described as preallocated vectors of object).
It follows that if a process performs a durable dynamic allocation
within the kernel, that allocation must be accounted somewhere, and it
must be subject to limits (resource quotas). The key issue is to
determine where the quota checks should be implemented.
Claim 2: If the system in question uses explicit persistence, it is
feasible to place this allocation and quota checking function in the
kernel. If the system uses implicit persistence (as in KeyKOS), then
*all* dynamically allocated kernel state must be non-definitive, in the
sense that it is a cache of some definitive state which has its real
home in some object that can be written to disk.
Claim 3: It is unknown how to construct an explicitly persistent pure
capability system that is either consistent or secure (in the
mathematical sense), because the state recorded on the store may not
represent a consistent cut. A non-consistent cut is one that cannot be
described as a sequence of correctness-preserving operations applied to
an initially correct capability graph. That is: the state evolution
induction is lost, and in consequence the argument that the system
remains in an authorized state is lost. Because of this, the
requirements of the information flow analysis that underlies the systems
overall security foundation are not upheld in an explicitly persistent
Of these, I suspect that claims (1,2) are not controversial, but claim 3
may be. If you know of another way to preserve consistent cuts, I'm
*very* interested to learn of it! I predict that any discussion of claim
3 will devolve to either (a) a very clever, previously unknown system
bootstrap approach, or (b) an argument that preserving the induction
isn't important in practice. The first is welcome (to me). The second
would be very disturbing.
There is a possibility that I have been overlooking. It is possible to
imagine a system implementation where the functionality of the KeyKOS
space bank (i.e. the disk storage allocator and quota manager) is
performed in the kernel. Such a system could satisfy claim (2) and still
manage the allocations from within the kernel. I suspect that such a
system would converge rapidly on a fully monolithic kernel
Aside: this pretty much tosses my claims about reifying mapping
structures out the window.
Jed: Are you assuming such a system?
> >So you seem to propose that there is a relatively high-level kernel
> >operation "map", which accepts as arguments:
> > a process (implicitly: the invoking process),
> > an address relative to the process's address space, and
> > an address space to be mapped at that address (a "file, segment,
> > or whatever") whose construction happens by unspecified means.
> No. I propose that there is an operation on a Process capability,
> "map", that accepts a data object (one that has read and write
> operations, and as I suggest lock operations). The operation
> specifies where the data in the data object should be mapped
> into the Process's address space. Of course a process might
> have access to it's own capability, but it also may not.
This is substantially the operation that I described. The key issues on
which we still are disagreeing:
1. What is the least atomic unit of read/write: byte or page? [This
determines whether simulation of load/store is required for
2. How is storage for the list of such mappings allocated in such a
way that it is later persistable? [the discussion raised above]
> >This operation is somehow entitled to charge a process for the storage
> >that is used by the low-level data structures that record the desired
> Of course, though I view the term "somehow" as meaninglessly
> pejorative in the above.
It truly wasn't intended to be. It was intended to identify where I was
confused. Please accept my apologies.
> Can you define your "accountability requirement"? In my solution
> all storage devolves to rotating storage that is accounted for
> as is any object. Real memory use for us was tied into "CPU"
> charges and depended on real memory residency * time.
This seems reasonable, but memory must be persistable if it contains
definitive state. Address space mappings are clearly definitive.
Therefore, I consider them to need to live on rotating (or at least,
durable :-) storage. It is the allocation of this persistent storage at
mapping time that ultimately concerns me.
> >Can you explain how your DSM-style solution accounts for the behavior of
> >load and store instructions, which *must* be reifiable as capability
> >invocations if the system is to remain a pure object-capability system?
> Load and store instruction act according to the processor architecture
> on real memory - or trap. I fail to understand why you seem to
> argue that each individual load and store instruction must
> act as a capability invocation. From my perspective it's perfectly
> adequate to have the invocations on the storage (rotating storage)
> objects whose data is mapped into memory appear as capability
> invocations when rotating storage is read into or written from
> memory. This happens at a larger granularity (generally at
> page faults). This approach mirrors what's actually going on.
> What's the problem?
I think that we differ on our assumptions about the atomic unit of I/O.
>From a formal system safety perspective, there are no load and store
instructions and there is no hardware architecture. There are only
capability invocations. In order to argue that a system is safe, the
operations of the hardware need to be mapped onto the operations of the
capability-based system model.
I agree (and I have agreed several times) that for the vast majority of
real virtualization applications, it is sufficiently faithful to
virtualize at page granularity and ignore the detail that many
operations at the load/store granularity are being collapsed at this
level of virtualization.
What I am arguing here is that **in the limit** (which is something we
may never implement, but need to test conceptually) we really DO want to
be able to put an entry capability into a page table slot and have the
actual load and store instructions get handled one at a time through
I can even give a real use case: watchpoints. On real systems, hardware
watchpoints are very limited and a software mechanism for watchpoints is
I have stated that *implementing* virtualization at the load/store
granularity is irrationally hard on a few badly conceived hardware
architectures. It is not impossible. This is a flaw in the respective
architectures, not in the notions of object-capability systems.
I'll note (with a broad smile) that you're in pretty deep kim-chee
arguing on this one, because we actually *implemented* this level of
virtualization in IRIX (me) and Solaris (Roger Faulkner) in later
refinements of /proc. We actually saw cases where people would
remote-mount the /proc file system and debug from a second machine. We
turned a little green when we learned that, but it was a surprisingly
useful thing to do. If nothing else, it's a pretty good anecdotal
confirmation that we successfully virtualized the memory interface.
Jonathan S. Shapiro, Ph.D.
The EROS Group, LLC
+1 443 927 1719 x5100
More information about the cap-talk