Re: *Draft* DIMSUM architecture paper available Jonathan Shapiro (shap@viper.cis.upenn.edu)
Mon, 26 Dec 94 17:17:12 -0500

This is a quick response on the easy parts of Bryan's questions.

  1. Devices: are you planning to put all device drivers in the kernel?

At a minimum, I propose to put all of the direct hardware operations into the kernel. In practice, I intend to put at least the bottom half of all device drivers into the kernel. My experience with user level device drivers has not been terribly positive.

That said, for some devices they make sense, and the kernel would implement a stub driver that just handles diddling device registers.

It is possible, and in some cases desirable, to virtualize the devices in a layer above that by building appropriate user-level processes. I see that as an essentially orthogonal issue to user level device drivers.

2. "All invocations of system-implemented object[s] are semantically atomic":

Can you flesh out exactly what this means a little more? For example, does it mean you can't interrupt an IPC operation in the middle and leave a partially-copied data buffer in the receiver?

I don't intend to let transfers be interrupted in mid-stream, The kernel is free to buffer, but w.r.t any given process the transfer happens "instantaneously."

When I wrote that sentence I had comething else in mind, though. It means that the caller may assume that operations performed on a given system object can be viewed as happening in some (unspecified) serial order.

In general, I'm wary of having the kernel widespread general commitments, like this, that often turn out to be unnecessary and may thwart optimization.

Your point is well taken. Basically, though, I'm trying to keep the KeyKOS message passing system with as little change as I can manage. It ain't broke, and I'ld rather spend my energy on other problems.

2.1.1. Can you generate a read-only segment key from a read-write segment key?

Yes. I see that I missed that one. Thank you.

2.1.2. Page faults (or, in general, requests-for-data) aren't defined as "exceptions", even though they get propagated to the segment keeper?

DIMSUM page faults do not get propagated to the segment keeper. They get propagated to the address space keeper. Page faults, however, are different from data frame faults. If the page is properly mapped, but the underlying data is not present, that turns into a data frame fault.

At one point I removed the discussion of segment data frame faults because I wasn't very happy with what I had. It ended up in section 5
(Segment Content) and needs to be merged back in.

Also, all of the places where I said "exceptions" are placeholders. I need a term. I may simply switch to keeper invocations, and discuss the mapping of hardware exceptions to keeper invocations elsewhere.

This whole issue needs a better description.

2.2.1. overlapping mappings: you may be falling into one of the traps that I think made Mach VM enormously more complicated than it should be - requiring the kernel to deal with detecting mapping overlaps, resizing mappings, splitting mappings, recombining mappings, etc.

I may buy that, though I want to think on it. Are you suggestion that DIMSUM should simply prohibit overlapping mappings and require the user to remap?

...have you thought about the implications of mapping a small segment A in the middle of a large segment B, thereby splitting the second mapping in two?

I don't see a problem, other than a possible need to allocate more space in the mapping data structure. Space allocation in general is still an open issue, but offhand, I don't see how it's any different in building address spaces than anywhere else.

2.3. Are you going to support priority inheritance in any form? It seems like any kind of RT support would be difficult without it.

Did I forget to remove the open issue note?

Among the priority-based schedules one needs to manage priority inversion, and priority inheritance is the simplest way to do so. The dynamic priority of a process will be the priority of it's highest priority invoker, and invokers queue in priority order.

In the long term, I'm more interested in design of a good hard contract scheduler and quality of service compositor, since I think that's a better way to go for a lot of multimedia type apps.

2.4. To whom is a read-only slot read-only? Does this mechanism provide any security features, or is it just a convenicence?

To everyone. The holder of a key table key can change the mask at will.

This is a small evolution of the earlier zero key register idea. I wanted two key tables for other reasons, and to implement the zero key register I would have needed to tag them with different internal types. By adding the mask I don't need to do that. It's not intended as a security feature.

What are the benefits of read-only key table slots?

At the moment, the only real purpose I see is dealing with throwing away returned keys in messages. You have to specify an inbound slot. By specifying an immutable slot you can simply throw a key away.

The alternative was to increase the number of registers needed to pass message arguments, which would have pushed me over the number of available registers on the 386. I think that message passing performance is still important, and that Bershad was reasoning circularly.

Only fairly simple services will be able to operate with a fixed 16- or 32-slot key space; more complicated services will probably need to use their directly-accessible slots as a cache for another, much larger key space.

When I first started looking at KeyKOS I thought that too, but the great majority of KeyKOS processes ran quite comfortably in 16 keys. UNIX processes are not as small in this regard, but in most cases should run okay with 32.

For simple services, I can't envision shared key tables being very useful (do you have a concrete example of how they would be?),

As it happens, I'm not convinced that sharing key tables is a good idea either. On the other hand, I'm reluctant to toss the idea so quickly just because we don't se a good use for it now.

Here is the (contrived) example. Consider a multithreaded program running in an emulated UNIX environment. One key table holds the per-process authorities, such as address space, file system key, etc. The other is used by individual threads in order to have return slots for any system calls that they make (which are ultimately built on IPC). The problem is that without seperate key tables the individual threads cannot make independent invocations, and without a shared key table they cannot easily manage things like a common open file table. The shared key table is read-mostly.

Sharing or not, it's convenient to have two key tables rather than one larger one because it keeps key tables more in line with the other system object representation sizes, which reduces memory allocation complexity in the kernel and simultaneously reduces the number of objects that some processes will need to allocate (many will run fin in 16 key registers).

...you have a few directly accessible, manually managed "key registers" that are private to each thread and are typically used as a cache; then you have much larger "key spaces" that can be shared between threads.

Key registers are private to a thread in KeyKOS only by convention. The register node is seperate from the domain root and could, in principle, be shared. I don't know if it ever was in practice.

KeyKOS key spaces are implemented as processes, and carry a consequential context switch overhead. I haven't had occasion to talk through the merits of shared libraries for this sort of thing, but the KeyKOS system uses domains in many places where you or I might make use of shared libraries instead. Sometimes that has security benefits. Other times I'm not so sure.

2.5.1. available/waiting/blocked/disabled states: Are these exactly the same terms KeyKOS uses for the equivalent states?

KeyKOS does not have a disabled state. The disabled state is a placeholder until I figure out what debugging sequence points I want to stick into the state diagram. The basic idea is to freeze a process without thereby causing it's messages to be dropped on the floor.

3. 16K message data limit: Why the increase?

Some of the machines in the heterogeneous clusters we intend to support will be using a 16k page size very shortly, and I wanted to allow such machines to transfer a full page in a single message in support of the data validation protocol.

I wanted to keep the size limited for reasons we have discussed at length. Finally, I wanted the size limit to be universally the same
(i.e. not dependent on the machine) so that you don't have to worry
about the architecture of the process you are sending a message to.

It almost sounds like you're advocating that I go the other way - towards the *smallest* page size - because of the interrupt issue. Let me give some thought to whether my assumption about a minimum 1 page message sizes are sound.

7.1. sensory keys: is this mechanism supported in any way for non-system objects? For example, if a key table contains a start key, and someone with a sensory key to that key table tries to read that start key, does the kernel have any way of interacting with the owner of that start key (or someone else) in order to turn that start key into the appropriate equivalent "sensory" version of the start key?

I forgot about that one. What you get is a non-invocable start key.

Yes, the sensory key violates encapsulation.

One obvious way around this would be to define one of the eight data bits associated with any start key as the "read-only" bit. (Or add a ninth bit if there's room in the key.) The kernel would automatically set that bit when a sensory key is used to retrieve a start key. The server can then interpret that read-only bit in whatever way is appropriate for the type of object it implements.

I'll have a look at that - it's a good idea, but I'm running out of bits. One issue is that a sensory key is intertwined with the security model; a sensory key can never be an outbound channel. Your proposed change probably breaks that, though I still think it's a reasonable idea.

9. First of all, I'm not sure it's necessary or useful for a capability-based microkernel to specify any notion of "users" or "authentication" beyond the basic capability mechanism, even if capabilities aren't persistent. Let high-level servers and OS personalities sort that out.

I'm not happy with that notion either, and I'm perfectly open to tossing it. Can you describe how UNIX-on-Mach handles this issue? Also, can you tell me where I can read about the hurd approach? I'm trying to avoid routing all calls through a central server.

Jonathan