*Draft* DIMSUM architecture paper available

Jonathan Shapiro shap@viper.cis.upenn.edu
Mon, 26 Dec 94 17:17:12 -0500


This is a quick response on the easy parts of Bryan's questions.

   1. Devices: are you planning to put all device drivers in the kernel?

At a minimum, I propose to put all of the direct hardware operations
into the kernel.  In practice, I intend to put at least the bottom
half of all device drivers into the kernel.  My experience with user
level device drivers has not been terribly positive.

That said, for some devices they make sense, and the kernel would
implement a stub driver that just handles diddling device registers.

It is possible, and in some cases desirable, to virtualize the devices
in a layer above that by building appropriate user-level processes.  I
see that as an essentially orthogonal issue to user level device drivers.

   2. "All invocations of system-implemented object[s] are
   semantically atomic": 

   Can you flesh out exactly what this means a little more?  For example, does
   it mean you can't interrupt an IPC operation in the middle and leave a
   partially-copied data buffer in the receiver?  

I don't intend to let transfers be interrupted in mid-stream, The
kernel is free to buffer, but w.r.t any given process the transfer
happens "instantaneously."

When I wrote that sentence I had comething else in mind, though.  It
means that the caller may assume that operations performed on a given
system object can be viewed as happening in some (unspecified) serial
order.

   In general, I'm wary of having the kernel widespread general
   commitments, like this, that often turn out to be unnecessary and
   may thwart optimization. 

Your point is well taken.  Basically, though, I'm trying to keep the
KeyKOS message passing system with as little change as I can manage.
It ain't broke, and I'ld rather spend my energy on other problems.

   2.1.1. Can you generate a read-only segment key from a read-write segment
   key?

Yes.   I see that I missed that one.  Thank you.

   2.1.2. Page faults (or, in general, requests-for-data) aren't defined as
   "exceptions", even though they get propagated to the segment
   keeper?  

DIMSUM page faults do not get propagated to the segment keeper.  They
get propagated to the address space keeper.  Page faults, however, are
different from data frame faults.  If the page is properly mapped, but
the underlying data is not present, that turns into a data frame
fault.

At one point I removed the discussion of segment data frame faults
because I wasn't very happy with what I had.  It ended up in section 5
(Segment Content) and needs to be merged back in.

Also, all of the places where I said "exceptions" are placeholders.  I
need a term.  I may simply switch to keeper invocations, and discuss
the mapping of hardware exceptions to keeper invocations elsewhere.

This whole issue needs a better description.

   2.2.1. overlapping mappings:  you may be falling into one of the traps that
   I think made Mach VM enormously more complicated than it should be -
   requiring the kernel to deal with detecting mapping overlaps, resizing
   mappings, splitting mappings, recombining mappings, etc.

I may buy that, though I want to think on it.  Are you suggestion that
DIMSUM should simply prohibit overlapping mappings and require the
user to remap?

   ...have you thought about the implications of mapping a small
   segment A in the middle of a large segment B, thereby splitting the second
   mapping in two?

I don't see a problem, other than a possible need to allocate more
space in the mapping data structure.  Space allocation in general is
still an open issue, but offhand, I don't see how it's any different
in building address spaces than anywhere else.

   2.3. Are you going to support priority inheritance in any form?  It seems
   like any kind of RT support would be difficult without it.

Did I forget to remove the open issue note?

Among the priority-based schedules one needs to manage priority
inversion, and priority inheritance is the simplest way to do so.  The
dynamic priority of a process will be the priority of it's highest
priority invoker, and invokers queue in priority order.

In the long term, I'm more interested in design of a good hard
contract scheduler and quality of service compositor, since I think
that's a better way to go for a lot of multimedia type apps.

   2.4. To whom is a read-only slot read-only?  Does this mechanism provide
   any security features, or is it just a convenicence?

To everyone.  The holder of a key table key can change the mask at
will.

This is a small evolution of the earlier zero key register idea.  I
wanted two key tables for other reasons, and to implement the zero key
register I would have needed to tag them with different internal
types.  By adding the mask I don't need to do that.  It's not intended
as a security feature.

    What are the benefits of read-only key table slots?

At the moment, the only real purpose I see is dealing with throwing
away returned keys in messages.  You have to specify an inbound slot.
By specifying an immutable slot you can simply throw a key away.

The alternative was to increase the number of registers needed to pass
message arguments, which would have pushed me over the number of
available registers on the 386.  I think that message passing
performance is still important, and that Bershad was reasoning
circularly.

   Only fairly simple services will be able to operate with
   a fixed 16- or 32-slot key space; more complicated services will probably
   need to use their directly-accessible slots as a cache for another, much
   larger key space.

When I first started looking at KeyKOS I thought that too, but the
great majority of KeyKOS processes ran quite comfortably in 16 keys.
UNIX processes are not as small in this regard, but in most cases
should run okay with 32.

   For simple services, I can't envision shared key tables being very
   useful (do you have a concrete example of how they would be?), 

As it happens, I'm not convinced that sharing key tables is a good
idea either.  On the other hand, I'm reluctant to toss the idea so
quickly just because we don't se a good use for it now.

Here is the (contrived) example.  Consider a multithreaded program
running in an emulated UNIX environment.  One key table holds the
per-process authorities, such as address space, file system key, etc.
The other is used by individual threads in order to have return slots
for any system calls that they make (which are ultimately built on
IPC).  The problem is that without seperate key tables the individual
threads cannot make independent invocations, and without a shared key
table they cannot easily manage things like a common open file table.
The shared key table is read-mostly.


Sharing or not, it's convenient to have two key tables rather than one
larger one because it keeps key tables more in line with the other
system object representation sizes, which reduces memory allocation
complexity in the kernel and simultaneously reduces the number of
objects that some processes will need to allocate (many will run fin
in 16 key registers).

   ...you have a few directly accessible, manually managed "key
   registers" that are private to each thread and are typically used
   as a cache; then you have much larger "key spaces" that can be
   shared between threads.

Key registers are private to a thread in KeyKOS only by convention.
The register node is seperate from the domain root and could, in
principle, be shared.  I don't know if it ever was in practice.

KeyKOS key spaces are implemented as processes, and carry a
consequential context switch overhead.  I haven't had occasion to talk
through the merits of shared libraries for this sort of thing, but the
KeyKOS system uses domains in many places where you or I might make
use of shared libraries instead.  Sometimes that has security
benefits.  Other times I'm not so sure.

   2.5.1. available/waiting/blocked/disabled states:  Are these exactly the
   same terms KeyKOS uses for the equivalent states?

KeyKOS does not have a disabled state.  The disabled state is a
placeholder until I figure out what debugging sequence points I want
to stick into the state diagram.  The basic idea is to freeze a
process without thereby causing it's messages to be dropped on the
floor.

   3. 16K message data limit:  Why the increase?  

Some of the machines in the heterogeneous clusters we intend to
support will be using a 16k page size very shortly, and I wanted to
allow such machines to transfer a full page in a single message in
support of the data validation protocol.

I wanted to keep the size limited for reasons we have discussed at
length.  Finally, I wanted the size limit to be universally the same
(i.e. not dependent on the machine) so that you don't have to worry
about the architecture of the process you are sending a message to.

It almost sounds like you're advocating that I go the other way -
towards the *smallest* page size - because of the interrupt issue.
Let me give some thought to whether my assumption about a minimum 1
page message sizes are sound.

   7.1. sensory keys:  is this mechanism supported in any way for non-system
   objects?  For example, if a key table contains a start key, and someone
   with a sensory key to that key table tries to read that start key, does the
   kernel have any way of interacting with the owner of that start key (or
   someone else) in order to turn that start key into the appropriate
   equivalent "sensory" version of the start key?  

I forgot about that one.  What you get is a non-invocable start key.

Yes, the sensory key violates encapsulation.

   One obvious way around this would be to define one of the eight data bits
   associated with any start key as the "read-only" bit.  (Or add a ninth bit
   if there's room in the key.)  The kernel would automatically set that bit
   when a sensory key is used to retrieve a start key.  The server can then
   interpret that read-only bit in whatever way is appropriate for the type of
   object it implements.

I'll have a look at that - it's a good idea, but I'm running out of
bits.  One issue is that a sensory key is intertwined with the
security model; a sensory key can never be an outbound channel.  Your
proposed change probably breaks that, though I still think it's a
reasonable idea.

   9. First of all, I'm not sure it's necessary or useful for a
   capability-based microkernel to specify any notion of "users" or
   "authentication" beyond the basic capability mechanism, even if
   capabilities aren't persistent.  Let high-level servers and OS
   personalities sort that out.

I'm not happy with that notion either, and I'm perfectly open to
tossing it.  Can you describe how UNIX-on-Mach handles this issue?
Also, can you tell me where I can read about the hurd approach?  I'm
trying to avoid routing all calls through a central server.


Jonathan