Re: *Draft* DIMSUM architecture paper available Jonathan Shapiro (shap@viper.cis.upenn.edu)
Sat, 14 Jan 95 21:32:35 -0500

I'm finally getting around to responding to Bill Frantz's list of questions. Apologies that this took so long to do. I have interleaved responses with Bill's questions, so it should be possible to skip large chunks easily. I've also tried to keep the individual answers short.

Term rotation alert: I intend to adopt the term 'domain' for DIMSUM processes. Their role is almost exactly parallel to KeyKOS. Not all of my writing includes the switch yet. Thought I would mention this to avoid potential confusion.

  1. Does DIMSUM distinguish between processors which share main memory, or are they treated as a single resource?

Good question. I can make a case for going either way, depending on the use of the machine. The intent is that the DIMSUM architecture should be able to support both, though I haven't given this any serious thought.

2.1 What coherency model is implemented. Sparc, for example, has three.

Short version: DIMSUM will only allow one machine to have write authority on an object's state at any given instant. There is no lazy propagation of changes in the current model. This may be a flaw, and needs more attention.

2.1 I would have understood the mapping discussion better on the first pass if the paper made it clear that DIMSUM supports mapping a window on a segment, and not a segment itself. (A difference from Multics.)

Thank you. The critique is a good one, and I'll try to remember to reflect it in the next edition of the architecture document.

2.1 If several processes share an address space, then the issue of who gets the address fault becomes important. My gut feel is that the address space keeper should get it with a way of passing it to/ invoking the process keeper if the address space keeper is unwilling to fix the problem.

We need to distinguish several kinds of faults:

  1. faults that are transparently repairable by the OS are not reported to any keeper. E.g. mapping faults for data that is in core.
  2. address space faults go to the address space keeper. These include invalid accesses and accesses that violate the access rights of the process on the object. Such faults may need to be back-propagated to the domain as they can be in KeyKOS.
  3. segment content read/write faults. These are reported to the segment keeper, and are not forwardable to the domain in the current model. The lack of forwarding may be a flaw, and deserves some attention.

2.2 Does it make sense for threads from multiple architectures to run within a single address space? I would think that in all cases, the machine languages would be different, and so would require separate address spaces, if only for the interpreter segment.

I see nothing in this that has to do with multiple architectures. In general, it may be useful for two threads in the same address space to run out of distinct code segments. Once the capability exists, there is no reason to think that these code segments must contain instructions for the same architecture.

Unless I'm missing something important, this seems completely orthogonal to the issue of what address space those segments are mapped into.

2.2.2 Can the authority of the address space key be subsetted? Can a user build a synthetic address space key that acts as an address space key in a process, but still allows the user's program to interpret all the key invocations. Allowing synthetic objects was an early KeyKOS design goal.

The authority of an address space key can not be subsetted in the sense that you mean.

In KeyKOS, segments, meters, and pages can be synthetically emulated only in a restricted sense. It is possible (even easy) for a process to expose the synthesis by plugging the synthetic versions into contexts where the kernel is not prepared to invoke a domain.

Based on this, I concluded that I didn't want to spend a lot of energy on synthetic memory objects. I concede that this may be a bad choice from some perspectives, but it does simplify some coherency issues.

2.2.1 Footnote 8 It can trap to the process keeper with an invalid fault. In general, can the segment extension fault be used to implement a "allocate on write" policy?

Yes. That is the intent. By mapping a window on a segment, the process is asserting that the addresses spanned by the window are in principle to be considered valid. The OS uses this information to determine if a segment extension should be attempted.

2.3 What is the contract with a real time process that doesn't want to use the CPU? Can it accuse DIMSUM of failing to meet its part of the contract?

I do not understand. A real time process that does not use the CPU executes no instructions no matter how many cycles you give it.

2.3 N.B. KeyKOS only used Schedule groups for cleanly stopping a group of processes.

Perhaps I have misunderstood. In talking to Norm about various ideas, the meter hierarchy came up again and again as a way for supporting user-level policies about computon allocation. This seems to me to be essentially orthogonal to the notion of a group of collaborating processes.

2.3.2 Perhaps an exception after a certain amount of computation would be useful. Like running out your meter in KeyKOS.

I agree that this is a good idea for the general purpose scheduling category.

2.4.2 Key Table Exceptions. Addressing a key table where your process has no key table key needs to send an exception to someone. Perhaps the process keeper is the logical candiate.

Good catch. I'll add it.

This is a domain exception, not a key table exception.

2.5 Footnote 9 - KeyKOS defined two address space keys for the 88000. I'm not sure the kernel accually worked because universally people would store the same key in both slots. I am not sure that first class support for Harvard architecture is worth it. UNIX certainly won't use it.

Norm and I had a long argument about this, Norm taking the view that the OS should not deprive the domain code of hardware features. He also conceded that the feature seemed like a bad idea. I "settled" the debate by writing the EROS domain code to support only a single address space key. I plan the same for DIMSUM.

2.5.1 2nd paragraph after the bullets says "A virtual processor in the halted state..." is this the same as the disabled state? Just how are process traps handled? KeyKOS gets machine dependent in this area, in an attempt to provide as much debugger support as the underlying architecture will allow.

This is an area that I will need to flesh out during the implementation. Based on my experiences building multi-architecture debuggers, my bias is that debugging events are much less machine dependent than people want to make them. I'ld like to make the interface, and in particular the "events of interest" as machine independent as possible. The state involved is, of course, machine dependent.

3. Send Keys and Receive Keys seems to imply that only the high bit of the 5 bit index resides in the message control word.

Correct. 4X5 = 20 bits for send keys + 20 bits for receive keys. 40 bits don't fit in a single 32-bit register.

5. Where does the recursion of having to call a segment keeper to resolve missing segment data end? Who resolves the segment faults of the segment keeper?

Same as in KeyKOS.

6. What are "core resources"?

A bad idea that I should assiduously avoid.

7.2 Void Key - What happens when you halt? How do you test for a void key? The KeyKOS DK0 approach didn't always trap you. If you accepted the return code, then you could test for void as part of your normal outcome testing.

DK0 was a sound idea, but I think it would have been better to have an invalid bit in the key in support of post-facto diagnostics, and not to have overloaded the notion of an invalid key with the notion of a data key. Thus void key. It needs to be specified, and I'm prepared to back off if it turns out to be a bad idea.

What happens when a process halts needs to be looked at. A temporary answer is that a suitable message is sent to the keeper.

7.2 Zero Segment - What are backing heaps?

The architecture document does not yet include copy on write segments, because I haven't got them thought through yet. This reference slipped through the edit pass.

9. Is there a reference for the Xanadu information architecture?

Regrettably not. There is a document entitled "The Xanadu Information Architecture" that was written at Xanadu prior to release 0.3i. Unfortunately, Xanadu did not do a good job of keeping it's house in order, and the master files for this document were lost. I have a fairly late paper copy.

9. The way I read this, each server that wishes to authenticate its clients has a segment key to the "offical" user description segment, and can map it and see all the user authentication data. This problem is a specific case of a more general view that servers are more trusted than clients.

9. What does it mean that user objects are immutable? The underling segment certainly isn't. How do you throw a user off the system?

The whole authentication issue has come under fire from lots of places, and I concede that the mechanism is half baked and should be withdrawn.

I also found some low level english type errors:

   2.2.3 Second paragraph has some english problems.
   2.3 Second paragraph has Realt-time.
   3. FIrst paragraph says "Operations sucha s..."
   7.1 first paragraph has "shedule"
   -----------------------------------------------------------------
   Bill Frantz                   Periwinkle  --  Computer Consulting

(408)356-8506 16345 Englewood Ave.
frantz@netcom.com Los Gatos, CA 95032, USA