> >As a general design principle, the kernel should not depend on a
> >non-kernel task for kernel correctness.
>
> This is generally true, but not always. The _usual_ reason for locating
> part of the system outside the kernel is because it isn't trusted, but
> that's not necessarily the only reason. Other reasons include design
> simplicity and debugging convenience.
>
>I think I may have perhaps confused an issue or two here by being
>inadequately specific. What I said was correct, but incomplete. I was
>trying to respond to a reasonable question raised by someone I
>suspected had not had occasion to consider the consequences of
>user-level drivers. This may have been a disservice.
>
>A service can be placed outside the kernel for many reasons. The
>issue is failure propagation. In the event that the service fails,
>the scope of the failure must be limited to the (transitive) clients
>of the failed service. Ideally, things fail in a way that allows the
>service to be repaired in situ.
If you s/kernel/trusted computing base/g, then I fully agree. My "Unix init process" example is indeed only valid as long as there is only one personality in the system (which is true for conventional Unix systems); but even if there are multiple personalities, there are still typically some set of globally trusted services that the entire system depends on for correctness, but don't necessarily run with the supervisor mode bit on.
>A corollary is that it is unacceptable for the kernel to block on such
>a service, though it is acceptable for the kernel to block the clients
>of the service. It is a design error of major proportions if moving a
>service outside of the kernel makes it possible for the kernel to
>arrive at a state where it is unable to make progress due to the
>failure of a user-level process.
It is a design error _iff_ moving that service outside of the kernel was intended to make it possible for that service to be untrusted - in that case the goal hasn't been achieved. But as I've already said, the reason for moving code outside of the kernel isn't always to make it untrusted; if the plan is for the code to remain globally trusted, then it's perfectly OK for the kernel to depend on it. One example would be a user-level disk device driver that is used to page kernel data: it's perfectly OK to put it outside of the kernel, as long as you recognize that it's still globally trusted. It may be useful to put that device driver outside the kernel even though it's still trusted, for example, simply because all other device drivers in the system are user-level and it's easier and cleaner that way. The L3 system exemplifies this design model.
>This implies that the kernel must not leave interrupts disabled while
>running a user task. Actually, I'ld go further and suggest that it's
^ _untrusted_
Anyway, you get the idea.
>The storage really is qualitatively different, however -- see below.
>
> It seems to me that the "node" abstraction really just represents
> storage used by a trusted entity but allocated by an untrusted
> entity: or, in other words, storage depended upon by a trusted
> server but used on behalf of an untrusted client.
>
>In KeyKOS, for a variety of reasons, capabilities cannot be stored in
>user data. Consider, for example, the difficulty of discovering when
>the last reference to an object goes away. The issue does not arise
>in systems like Mach, where ports are not persistent.
How does KeyKOS do this? Is there a reason why traditional techniques like reference counting won't work? If I could freeze a Mach system, snapshot all of the relevant kernel virtual memory including port data structures, tasks, messages, etc., then later reload all that data back into the same kernel virtual addresses, all of the internal references and reference counts and everything will still be valid. Of course, there is a little additional complexity in figuring out which kernel data is persistent and which isn't, and how to handle interactions between persistent and non-persitent data; but for the most part I think this is a separate and orthogonal problem.
BTW, once again, I understand this is exactly how L3 works - all kernel memory is virtual and pageable, and checkpointing simply involves writing out all the virtual memory in the system, both kernel and user, to backing store using appropriate logging techniques. Kernel data is not different from user data other than being globally trusted and inaccessible by user-level code. I had thought that this was the way KeyKOS handled persistence of nodes as well, but now I'm not sure.
>Another issue is that the possibility exists for the user code to
>forge a capability if they are mapped in user space, which would make
>the system entirely untrustable.
>I suppose that strictly speaking, should be viewed as manufacturing
>nodes that will be trusted by the kernel, and as such the formatter is
>part of the trusted computing base. Once created by the disk
>formatter, however, a KeyKOS node never goes away; it simply changes
>hands from one client to the next, being carefully rescinded at each
>transfer.
Exactly - a node is basically a locked black box that can be passed around arbitrarily among untrusted parties, and used in limited ways, but can only be "opened" for arbitrary examination and manipulation by an entity with the appropriate authority - i.e. the kernel in KeyKOS.
> This is definitely a necessary abstraction in a system like KeyKOS,
> but why should it be limited to kernel data, when many servers
> probably need a similar abstraction? For example, I'd like to be
> able to write an untrusted user-level server that doesn't trust its
> clients, but to hold its private data uses storage "provided" in
> some way by its clients... The server can't just accept page keys
> directly from its clients, because it doesn't know the security
> properties of those pages.
>
>It isn't limited to kernel data, though the cases are not exactly
>parallel. Your server can actually trust the client to supply
>trustable pages, because given a page key, one can rescind all other
>outstanding authorities to the page. This guarantees that the page is
>safe, provided that you were actually handed a page key. The problem
>is now reduced to verifying that you in fact have a page key, which
>can be done using a kernel primitive service (DISCRIM), or by plugging
>the key into a segment and faulting on it (KeyKOS doesn't virtualize
>pages well, which is a hole in the system design. Norm and I spend a
>while trying to resolve this, but didn't come up with any terribly
>wonderful ideas).
>
>Alternatively, the client can hand the server a key to a space bank
>from which the server can buy pages from a space bank. It is possible
>to verify that the space bank is for real by checking it's brand to
>verify that it was created by the space bank creator. Since the space
>bank creator is trusted, the problem is resolved.
I didn't say it couldn't be done in KeyKOS, and in fact I presented one possible way of doing it. You just presented two others, based on better knowledge of KeyKOS. However, both of these methods are based on the assumption that all useable pages are implemented by a globally trusted authority that the server can easily identify and authenticate. In a more flexible system that allows virtual memory to be backed by untrusted entities, such as Mach or NewSys, the server can't just take any page and use it, even if it's a "real" page. The server will probably trust certain external memory managers, including some that aren't part of the trusted computing base, but not others. The KeyKOS model has no support for this, and it'll be needed if any kind of accurate, reliable resource management system is going to be provided by Mach or NewSys.
I have some ideas on how to do this; one thing that's likely to make it significantly easier is migrating threads-based RPC. See my migrating threads paper (cs.utah.edu:/pub/thread-migrate.ps.Z) for a little more discussion on that matter.
> The KeyKOS user-level process model (and, in general, the
> traditional model) is that servers directly "own" all the resources
> they use, even the resources they use on behalf of their clients...
> The alternate, less conventional model is that of clients "supplying" the
> necessary resources for the servers to do their jobs: the server runs on
> the client's CPU time, charges the client for server memory allocated on
> its behalf, etc.
>
>Consider a pernicious client process that has created a meter with no
>keeper and allocate 5 ticks of runtime to that meter. The client
>calls a shared service which runs on the client's meter. The quanta
>runs out midway through the service, and the client perniciously
>refuses to refill the meter. Denial of service.
...to the client itself. Big deal. As long as the server is correctly written under the "migrating-resource" model, a malicious client won't impede the functioning of the server for other clients.
One obvious objection is that the client is occupying resources in the server - a stack, an activation (domain in KeyKOS terms), and possibly other allocated memory. But this isn't a problem if the server is fully playing in the new model, and all of those resources were "provided by" or "charged to" the client properly.
In certain servers there may be special resources that can't easily be managed in this way; for such servers a "pure" migrating-resource model is inappropriate, and a combination of the two models is needed. That's OK too. For example, the server might normally run on the CPU time of its clients, but it would also maintain a perpetual "minimum priority" or "backup meter" that applies to all its activations and ensures that everything can keep making progress. But I can't think offhand of any examples of such servers; I expect they're reasonably uncommon.
> The interesting thing about KeyKOS in this regard is that it is
> schizophrenic. It supports the first model for relationships among
> user-level programs, but the KeyKOS kernel itself, considered as a "server"
> whose clients are all the user-level processes in the system, is designed
> very strongly around the second model.
>
>If you look at the kernel services carefully, you discover that the
>kernel performs certain well defined manipulations on user-provided
>resources, but does not make use of these resources itself. The
>KeyKOS kernel is pretty much stateless.
When you create a new domain or segment in KeyKOS, as I understand it you basically give it a node and say, "here, make me a domain out of this node". You, the untrusted peon, are "providing" the kernel with the resources with which to build the requested object - namely the node. Sure, you can't actually get to the bits in kernel memory representing the node, but you are controlling and supplying the resource as a sort of currency, and the kernel resources used are "charged" to you appropriately by virtue of the node being made unavailable to you for other purposes until you destroy the domain. Thus, the KeyKOS kernel operates purely under the "client-provided-resuorces" model, even though for user-level programs it only directly supports the traditional model.
Bryan