Re: Architecture of Backing Store Descriptors Bryan Ford (baford@schirf.cs.utah.edu)
Tue, 22 Nov 94 17:07:51 MST

>I am trying to write up an architectural description of the new
>system. In it, I have arrived at a difficult design problem and would
>welcome suggestions.

First, since I haven't been on the eros mailing list very long and haven't seen what's been discussed so far, forgive me if I bring up any old beaten-to-death topics. (Jonathan, is there an archive I could get?)

I'd like to challenge a few rules and assumptions I believe are in KeyKOS that may not be completely necessary, and could possibly be relaxed somewhat to produce a more flexible system that still has all the desirable properties of KeyKOS.

First, from a recent message of yours on the mach4 list:

>As a general design principle, the kernel should not depend on a
>non-kernel task for kernel correctness.

This is generally true, but not always. The _usual_ reason for locating part of the system outside the kernel is because it isn't trusted, but that's not necessarily the only reason. Other reasons include design simplicity and debugging convenience. For example, in typical Unix systems, if the init process is buggy and dies, your whole system is hosed, so as far as systemwide correctness and trust is concerned the init process might as well be in the kernel. Why isn't it? Because it's simpler to make it an ordinary process, and since it isn't performance-critical there's no reason not to.

Maybe a minor terminology change is all that's necessary to clear this up: maybe we should be talking about "globally trusted" versus "untrusted" components - a "globally trusted" component can be placed in the kernel, but doesn't have to be, and either way may have direct access to kernel data; an untrusted component can never be placed in the kernel or have direct access to kernel data. Whether or not the supervisor mode bit is on at a particular time (kernel versus user mode) is at least partly orthogonal to the global trust issue.

>[...]
>In NewSys, the memory object description will be stored in an in-core
>"object frame" (somewhat analogous to a KeyKOS node -- only object
>frames can contain capabilities, and they cannot be mapped into user
>space).

Another KeyKOS rule, I believe, is that kernel data (stored in "nodes") is fundamentally different from user data (stored in "pages"). Kernel memory can never become user memory or vice versa; kernel memory is replicated in several places on stable storage whereas user memory isn't (or isn't always?); etc. Is this rule really necessary? It seems to me that the "node" abstraction really just represents storage used by a trusted entity but allocated by an untrusted entity: or, in other words, storage depended upon by a trusted server but used on behalf of an untrusted client. This is definitely a necessary abstraction in a system like KeyKOS, but why should it be limited to kernel data, when many servers probably need a similar abstraction? For example, I'd like to be able to write an untrusted user-level server that doesn't trust its clients, but to hold its private data uses storage "provided" in some way by its clients. Of course, the server can't just accept page keys directly from its clients, because it doesn't know the security properties of those pages. In KeyKOS, presumably to implement this you'd have some third-party entity that the server trusts, which provides pseudo-space banks containing pseudo-pages that clients can allocate, suballocate, pass around among themselves, and eventually pass to the server; but the "real" pages backing the pseudo-pages are only ever made available to the server and not to the clients. It seems like the situation is exactly the same for the user/kernel data issue; and if that's the case, why not just implement the abstraction only once, rather than twice - once for kernel data managed by user code and once for server data managed by client code?

I think my point holds true for the other "special" properties of kernel data (nodes) in KeyKOS as well: that user-level code often wants the same guarantees as the globally trusted kernel code, and the support could be implemented in the same way. For example, regarding replication of data, it seems likely that some subset of the user-level programs in a particular system will often contain data that is just as mission-critical as the kernel's, and thus should be replicated in the same way; other parts of the system might be somewhere in between. For example, if kernel data is replicated 3x, you might want 3x or 2x replication for some user-level objects, and none for others. Does KeyKOS support this? (It seems like it must, at least in some form, given its system requirements.) Why not implement this variable replication support symmetrically for both kernel and user data?

Basically, it seems that getting rid of the fundamental distinction between kernel and user data could produce a much more flexible system, and possibly a simpler one as well because of fewer redundant implementations of the same abstractions.

A related assumption KeyKOS seems to make is that clients and servers are always independent from the kernel's point of view: they communicate by sending messages, and never exchange any resources unless they explicitly arrange to do so. The KeyKOS user-level process model (and, in general, the traditional model) is that servers directly "own" all the resources they use, even the resources they use on behalf of their clients. For example, if a client sends an SQL query message to a database server, it's the database server's responsibility to allocate CPU time to service the request, to allocate memory to hold the request and the stack for the server to run on while processing it, etc. This traditional model fits well with the pure message-passing model, but isn't adequate for good resource management in the presence of complex client-server relationships. The alternate, less conventional model is that of clients "supplying" the necessary resources for the servers to do their jobs: the server runs on the client's CPU time, charges the client for server memory allocated on its behalf, etc. Of course, combinations of these two models are common, for example if the server runs on the client's CPU time while servicing requests but still maintains its own private memory pool.

The interesting thing about KeyKOS in this regard is that it is schizophrenic. It supports the first model for relationships among user-level programs, but the KeyKOS kernel itself, considered as a "server" whose clients are all the user-level processes in the system, is designed very strongly around the second model. The "no memory allocation in the kernel" principle is really a war cry for the second model - the kernel doesn't "own" any memory, but only uses the memory provided by its clients in the form of "nodes". So while KeyKOS preaches the traditional model, it practices the newer one in its own functioning. I think it would be much better if it could symmetrically support _both_ models, as appropriate, for both user- and kernel-level code.

So, with these things in mind, back to memory objects and paging:

>One would like to provide a mechanism whereby the memory object
>manager can provide an optional storage map to the kernel describing
>where the disk storage for the object can be found. This storage map
>would allow the kernel to service memory object I/O directly, and to
>provide ageing and pageout services for memory object data.

First, I'm not sure I understand the value of a "storage map" - is it just a performance hack to avoid having to go out of the kernel to an external pager on pagein and pageout? If so, I can see the potential win, but I'm not sure how big it'll be in the presence of well-optimized RPC, and in any case you could just load particular well-known, trusted pagers into the kernel for performance reasons. (In fact, one of these well-known, trusted pagers might be one that accepts a "storage map" from untrusted code and uses it to access the I/O devices directly - achieving the same goal without dirtying the external pager interface.)

In any case, assuming this storage map feature is desirable, there are two obvious ways it could be implemented: as a set of cached data that the kernel (or "storage map pager") can throw out at any time and re-request later; or as generic kernel data ("object frames" or "nodes"), treated just like all other kernel data in the resources-provided-by-clients model. In the latter case, a memory object manager that wants to use the storage map feature must supply the necessary kernel memory for it to do so. Even if NewSys doesn't implement persistence or paging of kernel data, if it is supposed to be a highly secure system, it still must provide a means for regulating allocation of kernel memory on behalf of clients, and presumably this would be done in much the same way as it's done in KeyKOS. Or am I missing something?

Anyway, I'll leave it at there for now - I suspect some of the stuff I've said already will raise some interesting discussion. :-)

Bryan