Architecture of Backing Store Descriptors

Jonathan Shapiro shap@viper.cis.upenn.edu
Wed, 23 Nov 94 13:30:22 -0500


   ... even if there are multiple personalities, there are
   still typically some set of globally trusted services that the
   entire system depends on for correctness, but don't necessarily run
   with the supervisor mode bit on.

In KeyKOS, an example would be something like a space bank, which
multiple personalities depend on.

   One example would be a user-level disk device driver that is used
   to page kernel data: it's perfectly OK to put it outside of the
   kernel, as long as you recognize that it's still globally trusted.

I understand what you are saying, and there is merit to it.

For the kernel to entrust privileged state to a user-mode device
driver that is part of the TCB is fine.  I wasn't arguing against
that. In the example you give, the kernel can still make progress on
other tasks while the pageout is in progress.

The question is whether it is "safe" for the kernel to *block* on such
a process. In my opinion this is unsound.  The opportunities for
deadlock that arise when this is done are a big problem, and it is
almost impossible to show that such deadlocks cannot occur.  If the
external driver is really simple enough that the deadlock can be
confidently avoided, I submit that there was no benefit to pushing it
out of the kernel, and disabling the supervisor bit is a distracting
irrelevancy.

This is not an argument against user-mode device drivers.  It is only
an argument against blocking the kernel on one.  Device driver
interfaces tend to want to be asynchronous in any case, so the issue
doesn't much arise if one is careful.

I'm not making the case well.  Perhaps Norm will be able to state it
better.

   How does KeyKOS do this?  Is there a reason why traditional techniques
   like reference counting won't work?  

Yes. How do you know which random bits in a random data page are
actually bits that constitute an outstanding capability?  The process
may allege to have released the capability, but there's no way to be
sure that it really has.

KeyKOS doesn't reference count.  Instead, it keeps a serial number
(called an allocation count) in each object.  When the object is
reassigned, the allocation count is incremented.  Keys also contain
the allocation count, and if the counts don't match the key is
invalid.  This avoids the need to GC the disk.  Some further tricks
are used to avoid bumping the count if none of the keys to the current
object generation have gone to disk.

   If I could freeze a Mach system... there is a little additional
   complexity in figuring out which kernel data is persistent and
   which isn't, and how to handle interactions between persistent and
   non-persitent data; but for the most part I think this is a
   separate and orthogonal problem.

It is seperate and orthogonal, but I think you underestimate the
challenge, as well as the implications for kernel heap management
(holes on the heap currently go away when the kernel reboots, at
least!).  When you delete a port, how do you ensure that port
authorities that have not been paged in yet are properly invalidated?
Once the system is persistent in the fashion you suggest, there are
likely to be far more processes by several orders of magnitude than
you are perhaps used to thinking about.

   BTW, once again, I understand this is exactly how L3 works - all
   kernel memory is virtual and pageable...

Can you send me a reference to some papers on L3?

   Kernel data [in L3] is not different from user data other than being
   globally trusted and inaccessible by user-level code.  I had
   thought that this was the way KeyKOS handled persistence of
   nodes as well, but now I'm not sure.

>From the standpoint of persistence you were right.  The difference is
Status: O

only that KeyKOS nodes are smaller than pages, primarily because most
of the kernel implemented objects are much smaller than a page.  Also,
the virtual address space would have to support far more than 2^32
bytes to hold any sizable system.

   In a ... system that allows virtual memory to be backed by untrusted
   entities, such as Mach or NewSys, the server can't just take any page and
   use it, even if it's a "real" page.

You are absolutely correct, and I had not thought this through.

   >Consider a pernicious client process that has created a meter with no
   >keeper and allocate 5 ticks of runtime to that meter.  The client
   >calls a shared service which runs on the client's meter.  The quanta
   >runs out midway through the service, and the client perniciously
   >refuses to refill the meter.  Denial of service.

   ...to the client itself.  Big deal.  As long as the server is correctly
   written under the "migrating-resource" model, a malicious client won't
   impede the functioning of the server for other clients.

I think you're mistaken.  The server can no longer service anyone,
because it is halted in the middle of processing the client's request,
and is therefore not waiting for requests.  In mach, I suppose this
only ties up a single server thread, but threads are not free!  Note
that the dead thread doesn't have enough run time authority to clean
up any temporary locks, and that no other thread in the server can
easily tell the difference between a low priority client and a dead
client.

I'll reread your migrating threads paper, though.

   One obvious objection is that the client is occupying resources in the
   server - a stack, an activation (domain in KeyKOS terms), and possibly
   other allocated memory.  But this isn't a problem if the server is fully
   playing in the new model, and all of those resources were "provided by" or
   "charged to" the client properly.

I'm very interested to see a secure model that supports this.  The
KeyKOS group tried to come up with one, and was unable to.

   Similarly, you might object that the stopped thread may be holding locks in
   the server and preventing other clients' threads from making progress; but
   in fact this is just the classic unbounded priority inversion problem, and
   any traditional priority inheritance scheme will solve the problem.

What this has done is transfer the burden.  The pernicious client has
succeeded in imposing extra work on other clients.

   ...  Thus, the KeyKOS kernel operates purely
   under the "client-provided-resuorces" model, even though for user-level
   programs it only directly supports the traditional model.

I must be missing something obvious.  It seems to me that user-level
programs in KeyKOS operate purely under "client provided resources" as
well, with the arguable exception of those processes that directly own
authority to hard physical resources like the disk.  In what sense is
this not true?


Jonathan