[EROS readers: I've responded to the KeyKOS questions here, though you're welcome to enhance and correct me. I'll respond to the NewSys questions seperately. -- shap]
Jonathan, is there an archive I could get?
The archive is only about five messages, none of which are terribly relevant. What's going (for the moment) under the name of NewSys is a new effort, and you are (regrettably) suffering some of the birth pangs. I'm working up an architectural overview, which should simplify things.
My apologies for the following - the response got to be rather longer than I initially intended. Bryan has asked some good questions, and I doubt that I have answered them fully.
>As a general design principle, the kernel should not depend on a
>non-kernel task for kernel correctness.
This is generally true, but not always. The _usual_ reason for locating part of the system outside the kernel is because it isn't trusted, but that's not necessarily the only reason. Other reasons include design simplicity and debugging convenience.
I think I may have perhaps confused an issue or two here by being inadequately specific. What I said was correct, but incomplete. I was trying to respond to a reasonable question raised by someone I suspected had not had occasion to consider the consequences of user-level drivers. This may have been a disservice.
A service can be placed outside the kernel for many reasons. The issue is failure propagation. In the event that the service fails, the scope of the failure must be limited to the (transitive) clients of the failed service. Ideally, things fail in a way that allows the service to be repaired in situ.
This implies that the kernel must not leave interrupts disabled while running a user task. Actually, I'ld go further and suggest that it's almost always the best policy to have the kernel buffer asynchronous events in a small buffer, as these tend to have strict performance requirements.
In NewSys, as in KeyKOS, we make a strong distinction between the correctness of the kernel and the correctness of a rehosted operating environment. If init dies, it's pretty clear that UNIX is hosed, but the kernel itself may still be operating correctly. In the KeyKOS context, there are a number of subsystems (e.g. the KeyKOS transaction facility) that run hosted directly on top of the KeyKOS primitives. It would be very unfortunate if a wayward rehosted OS could cause such a service to fail. This would in fact constitute a serious flaw in both the correctness and the security of the system.
One way to think about this is to conceptualize KeyKOS (and NewSys) as providing a virtual machine on top of which a number of "operating environments" may be running simultaneously. In a correct kernel implementation, the failure of one of these environments must not imply the failure of others unless it is due to the failure of some service shared in common by the two environments.
Given this, the situation with init is not quite parallel.
Another KeyKOS rule, I believe, is that kernel data (stored in "nodes") is fundamentally different from user data (stored in "pages"). Kernel memory can never become user memory or vice versa; kernel memory is replicated in several places on stable storage whereas user memory isn't (or isn't always?); etc.
In most KeyKOS implementations, both types of storage are replicated on disk. Nodes are in some sense more important because the failure of storage on certain nodes (e.g. a meter near the top of the tree) can mess up a large part of the system, while the failure of a page will in general destroy only a single process. That's a weak argument, but an important consideration if disk space is constrained.
Actually, nodes aren't allocated in the way that you seem to mean. They are transferred, and access to them can be rescinded.
This is definitely a necessary abstraction in a system like KeyKOS,
but why should it be limited to kernel data, when many servers
probably need a similar abstraction? For example, I'd like to be
able to write an untrusted user-level server that doesn't trust its
clients, but to hold its private data uses storage "provided" in
some way by its clients... The server can't just accept page keys
directly from its clients, because it doesn't know the security
properties of those pages.
It isn't limited to kernel data, though the cases are not exactly
parallel. Your server can actually trust the client to supply
trustable pages, because given a page key, one can rescind all other
outstanding authorities to the page. This guarantees that the page is
safe, provided that you were actually handed a page key. The problem
is now reduced to verifying that you in fact have a page key, which
can be done using a kernel primitive service (DISCRIM),
Alternatively, the client can hand the server a key to a space bank from which the server can buy pages from a space bank. It is possible to verify that the space bank is for real by checking it's brand to verify that it was created by the space bank creator. Since the space bank creator is trusted, the problem is resolved.
Actually, the space bank retains sufficient authority to misallocate pages, but it is part of the TCB, and it honors the conceptual contract of having sold you the page.
Nodes are allocated through a space bank as well, but the kernel is the only entity trusted to manipulate the data within the nodes for security reasons. The kernel is simultaneously the creator and the sole client of the actual capability data.
For example, if kernel data is replicated 3x, you might want 3x or 2x replication for some user-level objects, and none for others. Does KeyKOS support this?
Yes. Storage of nodes and pages is completely symmetrical, and can be configured to whatever degree you wish in both cases.
Shared services cannot in general trust client-provided resources unless, as in the case of page keys, the have some mechanism by which to ensure safety.
In one small respect this is not true. The kernel conceptually runs on no meter. In practice, it runs on the client meter, and takes advantage of its privileged status to "stretch" the quanta enough to complete the requested kernel operation.
Jonathan