Re: Architecture of Backing Store Descriptors Jonathan Shapiro (shap@viper.cis.upenn.edu)
Wed, 23 Nov 94 13:30:22 -0500

... even if there are multiple personalities, there are still typically some set of globally trusted services that the entire system depends on for correctness, but don't necessarily run with the supervisor mode bit on.

In KeyKOS, an example would be something like a space bank, which multiple personalities depend on.

One example would be a user-level disk device driver that is used to page kernel data: it's perfectly OK to put it outside of the kernel, as long as you recognize that it's still globally trusted.

I understand what you are saying, and there is merit to it.

For the kernel to entrust privileged state to a user-mode device driver that is part of the TCB is fine. I wasn't arguing against that. In the example you give, the kernel can still make progress on other tasks while the pageout is in progress.

The question is whether it is "safe" for the kernel to *block* on such a process. In my opinion this is unsound. The opportunities for deadlock that arise when this is done are a big problem, and it is almost impossible to show that such deadlocks cannot occur. If the external driver is really simple enough that the deadlock can be confidently avoided, I submit that there was no benefit to pushing it out of the kernel, and disabling the supervisor bit is a distracting irrelevancy.

This is not an argument against user-mode device drivers. It is only an argument against blocking the kernel on one. Device driver interfaces tend to want to be asynchronous in any case, so the issue doesn't much arise if one is careful.

I'm not making the case well. Perhaps Norm will be able to state it better.

How does KeyKOS do this? Is there a reason why traditional techniques like reference counting won't work?

Yes. How do you know which random bits in a random data page are actually bits that constitute an outstanding capability? The process may allege to have released the capability, but there's no way to be sure that it really has.

KeyKOS doesn't reference count. Instead, it keeps a serial number (called an allocation count) in each object. When the object is reassigned, the allocation count is incremented. Keys also contain the allocation count, and if the counts don't match the key is invalid. This avoids the need to GC the disk. Some further tricks are used to avoid bumping the count if none of the keys to the current object generation have gone to disk.

If I could freeze a Mach system... there is a little additional complexity in figuring out which kernel data is persistent and which isn't, and how to handle interactions between persistent and non-persitent data; but for the most part I think this is a separate and orthogonal problem.

It is seperate and orthogonal, but I think you underestimate the challenge, as well as the implications for kernel heap management (holes on the heap currently go away when the kernel reboots, at least!). When you delete a port, how do you ensure that port authorities that have not been paged in yet are properly invalidated? Once the system is persistent in the fashion you suggest, there are likely to be far more processes by several orders of magnitude than you are perhaps used to thinking about.

BTW, once again, I understand this is exactly how L3 works - all kernel memory is virtual and pageable...

Can you send me a reference to some papers on L3?

Kernel data [in L3] is not different from user data other than being globally trusted and inaccessible by user-level code. I had thought that this was the way KeyKOS handled persistence of nodes as well, but now I'm not sure.

>From the standpoint of persistence you were right. The difference is Status: O

only that KeyKOS nodes are smaller than pages, primarily because most of the kernel implemented objects are much smaller than a page. Also, the virtual address space would have to support far more than 2^32 bytes to hold any sizable system.

In a ... system that allows virtual memory to be backed by untrusted entities, such as Mach or NewSys, the server can't just take any page and use it, even if it's a "real" page.

You are absolutely correct, and I had not thought this through.

>Consider a pernicious client process that has created a meter with no
>keeper and allocate 5 ticks of runtime to that meter. The client
>calls a shared service which runs on the client's meter. The quanta
>runs out midway through the service, and the client perniciously
>refuses to refill the meter. Denial of service.

...to the client itself. Big deal. As long as the server is correctly written under the "migrating-resource" model, a malicious client won't impede the functioning of the server for other clients.

I think you're mistaken. The server can no longer service anyone, because it is halted in the middle of processing the client's request, and is therefore not waiting for requests. In mach, I suppose this only ties up a single server thread, but threads are not free! Note that the dead thread doesn't have enough run time authority to clean up any temporary locks, and that no other thread in the server can easily tell the difference between a low priority client and a dead client.

I'll reread your migrating threads paper, though.

One obvious objection is that the client is occupying resources in the server - a stack, an activation (domain in KeyKOS terms), and possibly other allocated memory. But this isn't a problem if the server is fully playing in the new model, and all of those resources were "provided by" or "charged to" the client properly.

I'm very interested to see a secure model that supports this. The KeyKOS group tried to come up with one, and was unable to.

Similarly, you might object that the stopped thread may be holding locks in the server and preventing other clients' threads from making progress; but in fact this is just the classic unbounded priority inversion problem, and any traditional priority inheritance scheme will solve the problem.

What this has done is transfer the burden. The pernicious client has succeeded in imposing extra work on other clients.

... Thus, the KeyKOS kernel operates purely under the "client-provided-resuorces" model, even though for user-level programs it only directly supports the traditional model.

I must be missing something obvious. It seems to me that user-level programs in KeyKOS operate purely under "client provided resources" as well, with the arguable exception of those processes that directly own authority to hard physical resources like the disk. In what sense is this not true?

Jonathan