archival test Jonathan S. Shapiro ((no email))
Mon, 13 May 1996 13:22:26 -0400

hi there

From shap Wed Jun 5 10:21:45 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id KAA27198; Wed, 5 Jun 1996 10:21:44 -0400 Date: Wed, 5 Jun 1996 10:21:44 -0400
Message-Id: <199606051421.KAA27198@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: Yuck - background keys

Another fine detail about red segments just swatted me in the face:

I can see no reason in principle why the "background segment" of a red segment should not be a background window key. Since this key does not fall within the initial slots, it would be interpreted against the background segment of some enclosing black segment. I can't see what this would be good for, but it seems an inconsistency in the model for this to be unsupported.

This is recursive, and implies that the segment traversal must preserve *all* background (and probably keeper) keys traversed during the segment walk. Since the need to do so in turn implies that one could not invert the producer relationship to short circuit the walk, I assume that KeyKOS did not support this.

Reactions?

Jonathan

From shap@columbia.syncomas.com Wed Jun 26 01:06:32 1996 Return-Path: shap@columbia.syncomas.com
Received: from columbia.syncomas.com (TS7-52.UPENN.EDU [128.91.201.38]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id BAA18387 for <eros-arch@eros.cis.upenn.edu>; Wed, 26 Jun 1996 01:03:51 -0400 Received: (from shap@localhost) by columbia.syncomas.com (8.7.4/8.7.3) id BAA01310; Wed, 26 Jun 1996 01:03:13 -0400 Date: Wed, 26 Jun 1996 01:03:13 -0400
Message-Id: <199606260503.BAA01310@columbia.syncomas.com> From: "Jonathan S. Shapiro" <shap@AURORA.CIS.UPENN.EDU> To: eros-arch@eros.cis.upenn.edu
Subject: EROS update

Well, it's been a while since I've sent anything to y'all. I've been buried in the code for quite a while, most recently working on the IPC path. Here's an update, and an encapsulation

First, after careful tuning, the EROS IPC now takes 8.3 usecs (one-way) for a null RPC on a 120 mhz Pentium. A null RPC transmits an order/return code, no keys (or rather, only the resume key) and no data buffer. The IPC path is coded mostly in C++, and executes (on average)

While this is a substantial improvement from the 39 usecs where things started, a back-of-the-envelope analysis suggests that something down closer to 0.5 usecs ought to be possible on this platform if coded in assembler. I'm going to tackle that, but first some post-mortem on what it took to get to where I am.

My starting point acknowledged that this code path was important. It was therfore not too bad to begin with, relative to a number of other implementations (QNX, Mach).

A couple of days work got me down into the 14 usec range simply by cleaning up some bugs and reorganizing my code to improve cache locality. Just FYI, this is better than anything other than L3/L4.
>From there to 10 usecs was achieved by further tuning and data
structure rearrangement. 8.3 usecs was obtained by further tuning.

So what are the remaining bottlenecks?

  1. The current path does some work earlier than it has to, e.g. key register decode.
  2. The validity testing before copying the data is unnecessarily expensive.
  3. C++ (and C) are not well-suited to an architecture this brain damaged.
  4. My much-touted object table was a bad idea.

The remainder of this note is an explanation of why the object table is a bad idea.

ABOUT THE OBJECT TABLE

If you like, consider this a bit of "humble pie," though I grant that in my case the notion may defy credulity.

The object table was based on an assumption: that key copy occurred in every IPC. I now think this is false. In practice, resume key generation is common, but key copy is not a high frequency case. Less relevant, but worth mentioning: no widely-used benchmark will ever test the key copy logic in computing RPC performance.

Several observations follow from the fact that key copy is a low frequency case:

o The linked list update never happens, so it's cache locality doesn't matter. This was the key argument for the object table.

o The object table induces an extra level of indirection, which is costly.

o The object table decomposes a simple test -- key preparation -- into multiple constituents that must be tested individually.

These issues add up. I'm about to revise the kernel to go back to the doubly linked list mechanism, and I'll let you know how that pans out. The results will be reported quantitatively.

I'll do the basic KeyKOS mechanism first, but I then plan to go one step further. Start and resume keys have a leveragable property: you can't make progress invoking them unless the domain has been successfully cached in a context cache (KeyKOS term: DIB).

I talked to Norm, who agrees that this seems plausible. Given this, I propose to make two changes to the KeyKOS design:

  1. Start keys and resume keys will be on a linked list that is rooted in the context cache entry (DIB).
  2. Start and resume keys will be on *separate* lists.

The first change eliminates the need to consult the domain root to find the context cache/dib. The second improves cache locality by taking advantage of the fact that there is (statistically) only one outstanding resume key at a time. The right and left siblings of this key are therefore identical, and the context cache needs to be in the data cache anyway.

Under this design change, if a key is successfully prepared as a start or resume key, this implies that the domain has been successfully cached into a DIB/context cache. Since neither a start nor resume key can proceed without this, this doesn't seem to impose a restriction in practice.

Jonathan

From frantz@netcom.com Wed Jun 26 04:03:18 1996 Return-Path: frantz@netcom.com
Received: from netcom7.netcom.com (netcom7.netcom.com [192.100.81.115]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id EAA18879 for <eros-arch@eros.cis.upenn.edu>; Wed, 26 Jun 1996 04:03:17 -0400 Received: from [207.92.177.115] (sjx-ca71-51.ix.netcom.com [207.92.177.115]) by netcom7.netcom.com (8.6.13/Netcom)

id BAA19462; Wed, 26 Jun 1996 01:02:28 -0700 Message-Id: <199606260802.BAA19462@netcom7.netcom.com> X-Sender: frantz@netcom7.netcom.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Wed, 26 Jun 1996 01:05:04 -0700
To: "Jonathan S. Shapiro" <shap@aurora.cis.upenn.edu>,

eros-arch@eros.cis.upenn.edu
From: frantz@netcom.com (Bill Frantz)
Subject: Re: EROS update

At 1:03 AM 6/26/96 -0400, Jonathan S. Shapiro wrote:
>... Less
>relevant, but worth mentioning: no widely-used benchmark will ever
>test the key copy logic in computing RPC performance.

I hope someday this assertion will prove false.

>Under this design change, if a key is successfully prepared as a start
>or resume key, this implies that the domain has been successfully
>cached into a DIB/context cache. Since neither a start nor resume key
>can proceed without this, this doesn't seem to impose a restriction in
>practice.

Early versions of KeyKOS had Domain key operations which un-prepared the domain. They were removed because they are a bad idea. This implementation makes them an even worse idea.


Bill Frantz       | The Internet may fairly be | Periwinkle -- Consulting
(408)356-8506     | regarded as a never-ending | 16345 Englewood Ave.
frantz@netcom.com | worldwide conversation.    | Los Gatos, CA 95032, USA



From shap@eros.cis.upenn.edu Wed Jun 26 14:48:43 1996 Return-Path: shap@eros.cis.upenn.edu
Received: from eros.cis.upenn.edu (localhost [127.0.0.1]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id OAA20453; Wed, 26 Jun 1996 14:48:43 -0400 Message-Id: <199606261848.OAA20453@eros.cis.upenn.edu> To: frantz@netcom.com (Bill Frantz)
cc: eros-arch
Subject: Re: EROS update
In-reply-to: Your message of "Wed, 26 Jun 1996 01:05:04 PDT."

<199606260802.BAA19462@netcom7.netcom.com> Date: Wed, 26 Jun 1996 14:48:42 -0400
From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>

In message <199606260802.BAA19462@netcom7.netcom.com>, Bill Frantz writes:

> Early versions of KeyKOS had Domain key operations which un-prepared the
> domain. They were removed because they are a bad idea. This
> implementation makes them an even worse idea.

In the EROS implementation, there are some operations that cause *portions* of the context structure to be flushed back to the domain. Changing the address space pointer, for example, unloads the mapping table pointer.

No current domain key operation unprepares the domain unless the domain has become malformed.

Jonathan

From shap@columbia.syncomas.com Thu Jun 27 03:19:19 1996 Return-Path: shap@columbia.syncomas.com
Received: from columbia.syncomas.com (TS6-25.UPENN.EDU [128.91.200.93]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id DAA14533 for <eros-arch@eros.cis.upenn.edu>; Thu, 27 Jun 1996 03:16:40 -0400 Received: (from shap@localhost) by columbia.syncomas.com (8.7.4/8.7.3) id DAA29743; Thu, 27 Jun 1996 03:15:54 -0400 Date: Thu, 27 Jun 1996 03:15:54 -0400
Message-Id: <199606270715.DAA29743@columbia.syncomas.com> From: "Jonathan S. Shapiro" <shap@AURORA.CIS.UPENN.EDU> To: eros-arch@eros.cis.upenn.edu
Subject: OT v/s doubly linked list

In message <199606261148.AA16185@POST.TANDEM.COM>, LANDAU_CHARLES@Tandem.COM wr ites:
> I'm pleased that the doubly-linked list mechanism seems to be the best.
> I think it's also the simplest.

Well, the numeric results were pretty impressive:

Using the OT-based design:

Image code size (bytes): 85751

                               S only  S+U
      Cycles		       894.5   998.65	8.322 usecs
      Instrs Executed	       410.5   417
      CPI		       2.18    2.39
      D cache miss (R+W)       1.01    1.01
      I cache miss	       0.028   0.031
      D TLB miss	       7       7.5
      I TLB miss	       3       4
      D cache R+W	       277.85  293.36
      I cache R		       138.71  143.21
      V-pipe instrs	       110.62  112.12
      W buffer full stall cy   0       0
      Mem read stall cy	       158.70  176.21
      Branches (T+NT)	       84.6    86.6
      BTB hits		       6       6
      Taken branches	       48      50
      D cache Reads	       168.71  178.22


Using the linked-list design:

Image code size (bytes): 80983

                               S only  S+U
      Cycles		       759.6   886.8	7.39 usecs
      Instrs Executed	       348     354.5
      CPI		       2.18    2.5
      D cache miss (R+W)       1.000   1.002
      I cache miss	       0.026   0.031
      D TLB miss	       5       6.5
      I TLB miss	       2       3
      D cache R+W	       241.8   257.3
      I cache R		       122.6   127
      V-pipe instrs	       91.6    93.0
      W buffer full stall cy   0       0
      Mem read stall cy	       114.13  154.66
      Branches (T+NT)	       76.61   78.61
      BTB hits		       6       6
      Taken branches	       44.06   46.06
      D cache Reads	       144.69  154.20

The payoff was very nearly double what I expected, which is clearly reflected in the numbers. One thing to note is that the branch target buffer on the processor just isn't helping much. Given this, the reduction in taken branches has greater importance than it initially appears.

The reduction in loads is pretty substantial. Just goes to show what a register-poor machine can cost you in register spills!

Jonathan

From shap@columbia.syncomas.com Sat Jun 29 02:49:35 1996 Return-Path: shap@columbia.syncomas.com
Received: from columbia.syncomas.com (TS9-46.UPENN.EDU [128.91.202.1]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id CAA26809 for <eros-arch@eros.cis.upenn.edu>; Sat, 29 Jun 1996 02:46:55 -0400 Received: (from shap@localhost) by columbia.syncomas.com (8.7.4/8.7.3) id CAA19551; Sat, 29 Jun 1996 02:45:57 -0400 Date: Sat, 29 Jun 1996 02:45:57 -0400
Message-Id: <199606290645.CAA19551@columbia.syncomas.com> From: "Jonathan S. Shapiro" <shap@AURORA.CIS.UPENN.EDU> To: eros-arch@eros.cis.upenn.edu
Subject: Results of hand-coded EROS IPC path

Well, I think we are pretty much at the end of the road for the IPC work for now. This note is to report on the results.

I finally caved in and hand-coded an IPC path for the null IPC in assembler. The results were simultaneously disappointing and educational.

First, the results of running the hand coded path and the best C++ path:

                               S only  S+U
      Cycles		       701.51  828.67	6.90 usecs
      Instrs Executed	       305     311.5
      CPI		       2.3     2.66
      D cache miss (R+W)       1.005   1.005
      I cache miss	       0.025   0.028
      D TLB miss	       5       6.5
      I TLB miss	       2       3
      D cache R+W	       227.31  224.82
      I cache R		       96.16   100.66
      V-pipe instrs	       75.09   76.59
      W buffer full stall cy   0       0
      Mem read stall cy	       112.63  153.15
      Branches (T+NT)	       60.10   62.11
      BTB hits		       0.01    0.01
      Taken branches	       32.06   34.06
      D cache Reads	       134.18  143.69

	*** Hand-coded path, which falls back on C++ path: ***

                               S only  S+U
      Cycles		       401.58  520.71	(4.34 usec)
      Instrs Executed	       123.80  130.30
      CPI		       3.24    4.00
      D cache miss (R+W)       1.002   1.002
      I cache miss	       0.011   0.013
      D TLB miss	       4       5.5
      I TLB miss	       2       3
      D cache R+W	       92.20   104.71
      I cache R		       40.61   45.11
      V-pipe instrs	       20      21.5
      W buffer full stall cy   0       0
      Mem read stall cy	       81.58   117.08
      Branches (T+NT)	       30.07   32.07
      BTB hits		       0.006   0.006
      Taken branches	       13.54   15.54
      D cache Reads	       49.12   55.62

Let's call it 521 cycles. Of these, many can be accounted for outright from a study of the Pentium manual:

	170   are due to I+D TLB miss processing
	 12   are due to address generation interlocks
	 45+  are due to branch prediction errors
	 48   are due to INT instruction
	 27   are due to IRET instruction
       ----
	302

Leaving 219 cycles. Approximately 15 of these are lost to address generation interlock stalls, so call it 204 cycles.

If we discard the INT and IRET instructions, we're left with 128 instructions, giving an ideal CPI (after taking into account the above adjustments) of about 1.6, which is pretty typical for this processor.

After a careful look at the data path, I can almost certainly reduce it by 20 instructions by eliminating one or two things. None of these impact the big hardware cost, so they might reasonably be expected to save us 31 cycles, bringing the IPC time down to 4 usecs.

Of the remaining 108 instructions, 35 or so are in some way associated with key processing. In a "keyless" system, one might therefore expect to whittle this down to something on the order of 3.61 usecs. Note that of that 3.61 usecs, only a small fraction is subject to user control.

Many of the simplifications of the assembler path should work equally well in C++ code, and I shall attempt to build a C++ fast path based by translating the assembler back to C++ to see what happens.

Some conclusions from all this:

+ register-poor architectures suck, though if we set aside the issue of AGEN interlocks I'm not really running short of registers in this path.

+ I'm pleasantly surprised at how close the full-featured path gets to the hand-optimized ideal.

+ I truly hate debugging assembly code

+ The Pentium was never designed for serious protection.

+ Context switch times aren't scaling, which is unfortunate.

Jonathan

From shap@columbia.syncomas.com Sun Jun 30 18:05:05 1996 Return-Path: shap@columbia.syncomas.com
Received: from columbia.syncomas.com (TS6-30.UPENN.EDU [128.91.200.98]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id SAA32078 for <eros-arch@eros.cis.upenn.edu>; Sun, 30 Jun 1996 18:02:25 -0400 Received: (from shap@localhost) by columbia.syncomas.com (8.7.4/8.7.3) id SAA05023; Sun, 30 Jun 1996 18:01:12 -0400 Date: Sun, 30 Jun 1996 18:01:12 -0400
Message-Id: <199606302201.SAA05023@columbia.syncomas.com> From: "Jonathan S. Shapiro" <shap@AURORA.CIS.UPENN.EDU> To: eros-arch@eros.cis.upenn.edu
Subject: holy shit (another IPC reduction)

Just to see what would happen, I tried re-arranging things so that the kernel would map it's code and data using large pages. This has two impacts:

  1. Tree walk search time is reduced for kernel D-TLB misses
  2. Large mappings are kept in a separate TLB, which eliminates the possibility of conflict misses in the D cache.

In addition, I removed some code that was rendered dead by the doubly linked key change. The removed code is nowhere near the IPC path, and was not executed in any of the previously reported numbers (either before or after the revision).

The effects were rather more dramatic than I had expected:

			  BEFORE		AFTER
			  S only  S+U           S only  S+U
 usecs			  3.35	  4.34		2.42	3.20
 Cycles                   401.58  520.71	290.39  384.49
 Instrs Executed          123.80  130.30	121.75  128.25
 CPI                      3.24	  4.0		2.385   2.99
 D cache miss (R+W)       1.002	  1.002		0.002   0.002
 I cache miss             0.011	  0.013		0.012   0.013
 D TLB miss               4	  5.5		1.000   1.500
 I TLB miss               2	  3		2       3
 D cache R+W              92.20	  104.71	90.17   102.67
 I cache R                40.61	  45.11		39.09   43.59
 V-pipe instrs            20	  21.5		20	21.5
 W buffer full stall cy   0       0		0	0
 Mem read stall cy        81.58	  117.08	17.52   26.03
 Branches (T+NT)          30.07	  32.07		30.056  32.056
 BTB hits                 0.006	  0.006		0.005   0.005
 Taken branches           13.54	  15.54		13.53   15.53
 D cache Reads            49.12	  55.62		49.11   55.71
 AGEN Interlocks          11.5	  12		11.5    12

Notes:
1. I cache misses are not reduced because this machine doesn't

implement large pages in the I cache -- it expands the large page mapping into a suitable number of small page mappings. Using large page mappings, however, does cut the reload time in half by eliminating some extra checks in the microcode and an extra memory fetch into the lower-level page table.

2. This is still the hand-coded path.

This is a 26% reduction. As before, I think I can shave another 0.3 usecs or so off this. We shall see in a little bit.

This is only the second time that I have seen quantitative benefits from the inclusion of large pages in an architecture, and the impact is more dramatic than in the previous reports (MIPS). I'm now a believer in large pages, especially if they eliminate competition for the user data TLB entries.

Direct translation is difficult, but the same path on an equivalently clocked PPC *ought* to run at around 0.83 usec. I arrive at this guestimate by the following calculation:

       140 cycles due to lack of TLB flush
        69 cycles difference in HW privilege transition cost
        80 cycles improved architectural parallelism
       ---
       289

       And 384-289 ~= 100 (0.83 usec)

While the PPC has more registers to save, the difference in costs is wiped out by the combination of PPC parallelism and the expense of segment register reload on the x86 familyu.

Also, note that by using large pages we are getting essentially comparable benefit to Jochen's small address spaces. By eliminating the TLB flush, we might reasonably expect to get down into the 2 usec range for that case.

Jonathan

From frantz@netcom.com Sun Jun 30 18:39:16 1996 Return-Path: frantz@netcom.com
Received: from netcom7.netcom.com (netcom7.netcom.com [192.100.81.115]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id SAA32180 for <eros-arch@eros.cis.upenn.edu>; Sun, 30 Jun 1996 18:39:16 -0400 Received: from [204.31.235.167] (sjx-ca30-07.ix.netcom.com [204.31.235.167]) by netcom7.netcom.com (8.6.13/Netcom)

id PAA21799; Sun, 30 Jun 1996 15:37:56 -0700 Message-Id: <199606302237.PAA21799@netcom7.netcom.com> X-Sender: frantz@netcom7.netcom.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Sun, 30 Jun 1996 15:40:33 -0700
To: "Jonathan S. Shapiro" <shap@aurora.cis.upenn.edu>,

eros-arch@eros.cis.upenn.edu
From: frantz@netcom.com (Bill Frantz)
Subject: Re: holy shit (another IPC reduction)

At 6:01 PM 6/30/96 -0400, Jonathan S. Shapiro wrote:
>This is only the second time that I have seen quantitative benefits
>from the inclusion of large pages in an architecture, and the impact
>is more dramatic than in the previous reports (MIPS). I'm now a
>believer in large pages, especially if they eliminate competition for
>the user data TLB entries.

Note the the 370 version of KeyKOS ran the kernel without translation. (Why translate when you don't page?) Making this easy was one of the 370's nice features. (The 370 could automatically change from mapped to unmapped as part of taking an interrupt/trap.)


Bill Frantz       | The Internet may fairly be | Periwinkle -- Consulting
(408)356-8506     | regarded as a never-ending | 16345 Englewood Ave.
frantz@netcom.com | worldwide conversation.    | Los Gatos, CA 95032, USA



From shap@eros.cis.upenn.edu Sun Jun 30 21:10:54 1996 Return-Path: shap@eros.cis.upenn.edu
Received: from eros.cis.upenn.edu (localhost [127.0.0.1]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id VAA00632; Sun, 30 Jun 1996 21:10:54 -0400 Message-Id: <199607010110.VAA00632@eros.cis.upenn.edu> To: frantz@netcom.com (Bill Frantz)
cc: eros-arch
Subject: Re: holy shit (another IPC reduction) In-reply-to: Your message of "Sun, 30 Jun 1996 15:40:33 PDT."

<199606302237.PAA21799@netcom7.netcom.com> Date: Sun, 30 Jun 1996 21:10:54 -0400
From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>

In message <199606302237.PAA21799@netcom7.netcom.com>, Bill Frantz writes:
> Note the the 370 version of KeyKOS ran the kernel without translation.

On the Pentium, I believe this would work out to be a wash -- the expense of switching back to unmapped mode alone equals the expense of doing the translation in software.

Jonathan

From shap@columbia.syncomas.com Sat Jul 13 14:53:38 1996 Return-Path: shap@columbia.syncomas.com
Received: from columbia.syncomas.com (TS10-32.UPENN.EDU [128.91.202.53]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id OAA32377 for <eros-arch@eros.cis.upenn.edu>; Sat, 13 Jul 1996 14:51:00 -0400 Received: (from shap@localhost) by columbia.syncomas.com (8.7.4/8.7.3) id OAA01040; Sat, 13 Jul 1996 14:49:50 -0400 Date: Sat, 13 Jul 1996 14:49:50 -0400
Message-Id: <199607131849.OAA01040@columbia.syncomas.com> From: "Jonathan S. Shapiro" <shap@AURORA.CIS.UPENN.EDU> To: eros-arch@eros.cis.upenn.edu
Subject: What is correct behavior?

Currently, an EROS domain that returns to a kernel key has it's process demolished. This is what I understood to be the behavior under KeyKOS.

I'm beginning to question whether I understood correctly. My question is what should happen if the returning domain supplies a gate key in slot 3. I can see two stories for what should happen:

  1. A kernel key is different from a gate key; the presence of a gate key in slot 3 should not matter when invoking a kernel key.
  2. The fact that it's a kernel key should in principle be invisible to the caller. Kernel keys conceptually assume that they are invoked by a CALL operation and return by (conceptually) performing a RETURN invocation on whatever key is in slot 4.

My naive reaction is that (2) is somehow semantically more "clean" answer. Which does KeyKOS do?

If we adopt (1), there are a few uglinesses in the gate key path, but nothing truly major.

If we adopt (2), there is a curious "cascade" effect if we generalize the third slot key type:

  1. domain places some *kernel* key in slot 3, forks first kernel key.
  2. kernel then returns to second kernel key, passing it the response from the key that was actually invoked.

My inclination is to disallow this, but doing so breaks the front-ending illusion.

Come to think of it, option (1) breaks the front-ending illusion too, as the front-ending domain cannot tell if the invocation was a return or not, and performs a return slot 4.

What to do?

Jonathan

From frantz@netcom.com Mon Jul 15 02:35:54 1996 Return-Path: frantz@netcom.com
Received: from netcom8.netcom.com (netcom8.netcom.com [192.100.81.117]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id CAA05289 for <eros-arch@eros.cis.upenn.edu>; Mon, 15 Jul 1996 02:35:53 -0400 Received: from [199.35.223.193] (sjx-ca19-01.ix.netcom.com [199.35.223.193]) by netcom8.netcom.com (8.6.13/Netcom)

id XAA04541; Sun, 14 Jul 1996 23:34:29 -0700 Message-Id: <199607150634.XAA04541@netcom8.netcom.com> X-Sender: frantz@netcom8.netcom.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Sun, 14 Jul 1996 23:37:10 -0700
To: "Jonathan S. Shapiro" <shap@aurora.cis.upenn.edu>,

eros-arch@eros.cis.upenn.edu
From: frantz@netcom.com (Bill Frantz)
Subject: Re: What is correct behavior?

At 2:49 PM 7/13/96 -0400, Jonathan S. Shapiro wrote:
>Currently, an EROS domain that returns to a kernel key has it's
>process demolished. This is what I understood to be the behavior
>under KeyKOS.
>
>I'm beginning to question whether I understood correctly. My question
>is what should happen if the returning domain supplies a gate key in
>slot 3. I can see two stories for what should happen:
>
>1. A kernel key is different from a gate key; the presence of a gate key
> in slot 3 should not matter when invoking a kernel key.
>
>2. The fact that it's a kernel key should in principle be invisible to
> the caller. Kernel keys conceptually assume that they are invoked
> by a CALL operation and return by (conceptually) performing a
> RETURN invocation on whatever key is in slot 4.
>
>My naive reaction is that (2) is somehow semantically more "clean"
>answer. Which does KeyKOS do?

KeyKOS adopts the view that all kernel implemented services (and any extra-kernel services whose domains must remain prompt) return via the Returner. The Returner ensures that the key is a resume key, otherwise it destroys the "process" or "thread of execution". The normal invocation for the Returner is:

Return (K0, K1, K2, ResumeKey) date, return code, etc.

A domain can also call the Returner (Which is why the alleged key type of the returner is KT). Calling the Returner can be used for copying keys between the calling domain's key slots.


Bill Frantz       | The Internet may fairly be | Periwinkle -- Consulting
(408)356-8506     | regarded as a never-ending | 16345 Englewood Ave.
frantz@netcom.com | worldwide conversation.    | Los Gatos, CA 95032, USA



From shap Mon Jul 29 15:56:48 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id PAA25509; Mon, 29 Jul 1996 15:56:47 -0400 Date: Mon, 29 Jul 1996 15:56:47 -0400
Message-Id: <199607291956.PAA25509@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: Contemplating a change...

Some things have been going on here that are leaning me towards a change in the EROS architecture, which I would like reactions to:

  1. The transparent persistence in EROS makes real-time support hard, and increases the kernel size quite a bit.
  2. There has been some interest in using EROS as a substrate for the soft-switch work here at Penn.
  3. I'm leaning towards IPC as my thesis topic, and persistence is not necessary to the thesis.

I'm therefore contemplating removing persistence from EROS, and rearchitecting away from the notion of "nodes". In the absence of persistence, it makes some sense to ask if pages and nodes cannot be the same size, and continue to use some smaller data structure for structuring address spaces.

Under the revision, the "context" structure (KeyKOS: DIB) reverts to a more conventional "process" or "domain" structure, having no backing nodes. Domains continue to have a fixed number of key registers, but can also have an optional "key space" (analogous to address space). Key space is made up of physical pages. Care is taken *by the space bank* to ensure that a physical page frame is not allocated as both a data page and a key page simultaneously.

I'm still noodling on how address spaces ought to be structured under this design, but I'm leaning towards something in the style of L4.

As with the current system, this design would not permit overallocation of resources.

Anybody screaming yet?

Jonathan

From LANDAU_CHARLES@tandem.com Mon Jul 29 16:27:16 1996 Return-Path: LANDAU_CHARLES@tandem.com
Received: from suntan.tandem.com (suntan.tandem.com [192.216.221.8]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id QAA25614 for <eros-arch@eros.cis.upenn.edu>; Mon, 29 Jul 1996 16:27:15 -0400 From: LANDAU_CHARLES@tandem.com
Received: from localhost by suntan.tandem.com (8.6.12/suntan5.960119) for <eros-arch@eros.cis.upenn.edu>

id NAA23621; Mon, 29 Jul 1996 13:25:14 -0700 Received: by localhost (4.13/4.5)

id AA825; 29 Jul 96 13:27:01 -0700 Date: 29 Jul 96 13:14:00 -0700
Message-Id: <199607291327.AA825@localhost> To: shap@eros.cis.upenn.edu
Cc: eros-arch@eros.cis.upenn.edu
Subject: Re: persistence

I'm not sure removing transparent persistence is viable unless you also throw away security (and therefore much reliability). If the system is to have anything persistent at all (e.g. "files"), you have to architect and build a mechanism to restore access to the persistent stuff after a restart. To do this in a reasonably secure way is almost as much work as transparent persistence. If the system doesn't have persistent files, what good is it?

I don't know much about real-time support, but I think it's genuinely hard, and requires giving up much of what makes operating systems nice, such as virtual memory. Real-time systems and general-purpose systems shouldn't be confused. I don't think one system can serve both purposes.

From frantz@netcom.com Mon Jul 29 16:45:34 1996 Return-Path: frantz@netcom.com
Received: from netcom8.netcom.com (netcom8.netcom.com [192.100.81.117]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id QAA25671; Mon, 29 Jul 1996 16:45:33 -0400 Received: from [199.182.128.163] (sjx-ca13-03.ix.netcom.com [199.182.128.163]) by netcom8.netcom.com (8.6.13/Netcom)

id NAA00905; Mon, 29 Jul 1996 13:45:24 -0700 Message-Id: <199607292045.NAA00905@netcom8.netcom.com> X-Sender: frantz@netcom8.netcom.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Mon, 29 Jul 1996 13:48:10 -0700
To: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>,

eros-arch@eros.cis.upenn.edu
From: frantz@netcom.com (Bill Frantz)
Subject: Re: Contemplating a change...

At 3:56 PM 7/29/96 -0400, Jonathan S. Shapiro wrote:
>Some things have been going on here that are leaning me towards a
>change in the EROS architecture, which I would like reactions to:
>
> 1. The transparent persistence in EROS makes real-time support hard,
> and increases the kernel size quite a bit.
>
> 2. There has been some interest in using EROS as a substrate for the
> soft-switch work here at Penn.
>
> 3. I'm leaning towards IPC as my thesis topic, and persistence is
> not necessary to the thesis.
>
>I'm therefore contemplating removing persistence from EROS...

The principle reason for persistence in KeyKOS is the secure restart problem. Persistence is an elegant way of avoiding/solving the problem. Let me describe some nonpersistent versions of the ideas:

(1) Communication protocol converter. Since it starts up from the same state every time, it does not need persistence. It can simply "big bang" each time. (N.B. All circuits are flushed when the thing goes down.) (This version exists and is still running today.)

(2) General purpose computing platform. Here we have a dilemma. Do we try to save the relationship of keys as it evolves thru running the system or do we always start from the same key relations (as in (1) above). If we do the latter, then keys only represent transient relationships. If we do the former, then we need a scheme to ensure that the key relationships which obtain after a restart do not introduce any security holes. (i.e. The accesses permitted are strictly less than or equal to those which existed when the system went down.) What is worse, we want to ensure that any programs which are run after restart know what the state of the keys are, so they do not introduce confused deputy problems.

Now, I suppose we could design a system where key storage is split into persistent and nonpersistent, following the model of a file system where disk blocks are persistent, but memory cache blocks are not. Then the only problem we have is ensuring the correctness and consistency of the disk structures.


Bill Frantz       | Cave ab homine unius lebri | Periwinkle -- Consulting
(408)356-8506     |  [Beware the man of one    | 16345 Englewood Ave.
frantz@netcom.com |   book]  - Anonymous Latin | Los Gatos, CA 95032, USA



From shap Wed Jul 31 10:53:31 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id KAA00527; Wed, 31 Jul 1996 10:53:30 -0400 Date: Wed, 31 Jul 1996 10:53:30 -0400
Message-Id: <199607311453.KAA00527@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: About that change

So I spent some time thinking about what the change would entail. I think it's worth thinking about further, but not right now. I can get my thesis completed faster be proceeding forward from where I already am.

Jonathan

From shap Wed Jul 31 12:09:39 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id MAA00779; Wed, 31 Jul 1996 12:09:39 -0400 Date: Wed, 31 Jul 1996 12:09:39 -0400
Message-Id: <199607311609.MAA00779@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: mach4-users@cs.utah.edu, eros-arch@eros.cis.upenn.edu Subject: Complexity isolation/smarter compilers

[ For the benefit of the eros-arch list, the background was a discussion on surprising changes in IPC performance under L3/L4 between Pentium and i486 that proved to be due to changes in cache organization. Jochen Liedtke had to reorganize both the code and the critical data structures to compensate. ]

[ Jay wrote: ]

That this kind [cache reference] of tuning is required suggests to me one of two things:

  1. Any software system that is this fragile and dependent on this level of platform knowledge, will lose to a design that is substantially less so.
  2. Or, systems this sensitive need to be written in a language and compiled by a compiler more powerful than normal C and normal C compilers. Let the machine do it, not people.

I've let this percolate in my head a bit, and I think these comments can be refined a little.

Most systems, monolithic or microkernel, have portions that are critically sensitive to machine architecture (e.g. the bcopy implementation). The question, then, is not "how dependent is the software on implementation artifacts", but rather "how well isolated are the implementation dependencies, and how many are there?"

The defining characteristic of microkernel design is not size, but critical dependency isolation and reduction. One reason (of many) that operating system performance has not kept pace with processor performance is that monoliths have poorly-isolated implementation dependencies. Even the conservative OS implementors agree that microkernels are easier to maintain.

The point is simply that there is a continuum to look at here.

In reference to smarter compilers, I confess that I am skeptical about the benefits for IPC but intrigued by the benefits for microkernels overall. Here are some thoughts derived from the EROS IPC implementation process concerning where our performance came from. All of these changes are good things to do independent of cache design and TLB particulars, but on some cache/TLB designs they are more important than on others. Most of them are *not* source-language sensitive. I apologize for the length.

  1. Align the IPC path to start at a page boundary and gather all of the supporting code into one page (or as few as possible).
  2. Specify the common case message payloads in registers.
  3. Reblock the code to minimize the number of taken branches. This simultaneously optimizes I-cache footprint and I-cache hit rate.

While we did this by hand for the EROS path, it has been shown (MIPS, HP) that this can be done by the code generator given the dynamic branch taken likelihood for each branch.

In the absence of a smart code generator, one can hand-write the C code (or whatever) to simply have the smart layout to begin with.

4. Design the portion of the process structure that is touched by the

IPC path with care to cluster fields according to reference patterns in the critical path.

We approximated this by hand. Some promising work in automated *data* reorganization was done in the PL.8 compiler, but I don't know to what degree this was pursued elsewhere.

5. Gather all the stuff you are going to need to touch into a single

structure. Pointer indirection hurts a lot. In UNIX terms, everything the IPC touches (aside from payload) should live in the moral equivalent of the 'proc' structure.

6. Do all your setup before you move the data, since the block move

must be assumed to trash the cache.

I think it's fair to suggest that (1) and (2) are obvious to everyone at this point. (3) will be obvious to anyone who has dealt with optimization and code generation. (6) is obvious once it is pointed out, but is mostly a matter of doing things in the right place in the code path. The real payoff was (4) and (5).

We found that after building an IPC this way, the extended basic blocks were completely data-dependent, that essentially all branches were data dependent, and that the majority of branches are not taken.

We could do a little better given another register or two to work with, but this result should hold up across machines. On a different architecture, the message specification would live in different registers, and therefore will reside at a different location in the process structure, but *the data reference pattern doesn't change* unless the semantics of the invocation changes.

In practice, the location of the process structure in the cache is not under OS designer control. You can worry the alignment issue, but it is very unlikely that the process structure and the IPC code will block out nicely in a unified cache.

This all adds up to a curious thing: in the special case of a completely data-dependent path that does no redundant data accesses (in practice ours is forced on the Pentium to do 3 or 4 due to register starvation), the optimal code and data layouts must begin at proper cache line alignment boundaries, but otherwise are independent of the cache structure provided that the cache has at least two sets. Further, failure to align them correctly in such a cache causes at most 3 marginal cache line references, which is a tiny effect.

We did a lot of tuning to the C path before we bit the bullet and build the assembler path -- mostly because I'm stubborn. In the end, the assembly code wins for 3 reasons, which I expect are independent of source language:

  1. There are no live registers at the start of the code sequence, which violates the assumptions of the compiler.
  2. There are no live registers at the end of the code sequence, which violates the assumptions of the compiler.
  3. We have ignored the register conventions because this machine has so few registers. We only just squeak by on the Pentium, but on a machine with only 2 or 3 more registers we could honor the calling convention at minimal cost.

The register starvation issue is really deadly.

So here's a radical thought:

The advantage to a really tiny system like L4 is that it's small enough to compile as a single unit. One could, in principle, collect statistics on the operation of the entire kernel, feed the results back into the compiler, and arrive at a kernel that was optimally restructured for the application at hand. The key to this is automated data restructuring.

Jonathan

From shap Wed Jul 31 15:48:15 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id PAA09729; Wed, 31 Jul 1996 15:48:14 -0400 Date: Wed, 31 Jul 1996 15:48:14 -0400
Message-Id: <199607311948.PAA09729@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: resolution of architecture change

[Jay: could you send me a one-liner confirming that you got this]

Okay. I think I've decided what to do about persistence for now.

The current state of the EROS kernel contains most of the goop to read and write objects to the disk, but does not contain a checkpointer or a migrator. My intended path had been:

  1. Write the user level code to format the disk
  2. Implement checkpointer and migrator.

The second isn't hard, but doing a good job on the first involves building quite a lot of user-level code. Again, nothing intrinsically difficult, but a large space of stuff to have to do.

>From a research perspective, the IPC performance stuff and the escrow
agent stuff are both demonstrable without having support for persistence, and at $10-$12 a megabyte the right answer is just to buy a boatload of memory if we need to. Even without persistence, we have something that is potentially very close to real time and embeddable, making it useful to the soft switch work here.

Rather than roll things back and lose the work that has already been done towards persistence, I am going to restructure the kernel to more cleanly separate the persistence support from the node and page logic.

In the "light" kernel, the bootstrap code will load a kernel and a system image into memory and transfer to the kernel (I'll likely use GRUB for this). On startup, the "light" kernel will perform device driver initialization (drivers are entitled to reserve fixed memory at startup), calculate how to divvy up the remaining memory for various kernel uses, and copy the image to the proper locations before initializing those regions.

The major difference between this approach and the ramdisk approach is that the ramdisk continued to chew up memory after it's content was loaded.

Jonathan

From shap Wed Jul 31 15:53:07 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id PAA09756; Wed, 31 Jul 1996 15:53:07 -0400 Date: Wed, 31 Jul 1996 15:53:07 -0400
Message-Id: <199607311953.PAA09756@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: addendum to last

I've just purchased a dual processor SMP machine, and we have a 4 processor machine in the lab. One thing I plan to do in the light version is look at how to make things SMP safe.

Jonathan

From LANDAU_CHARLES@tandem.com Mon Aug 5 19:26:42 1996 Return-Path: LANDAU_CHARLES@tandem.com
Received: from suntan.tandem.com (suntan.tandem.com [192.216.221.8]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id TAA00523; Mon, 5 Aug 1996 19:26:41 -0400 From: LANDAU_CHARLES@tandem.com
Received: from localhost by suntan.tandem.com (8.6.12/suntan5.960119)

id QAA26850; Mon, 5 Aug 1996 16:26:18 -0700 Received: by localhost (4.13/4.5)

id AA3897; 5 Aug 96 16:28:16 -0700 Date: 5 Aug 96 16:26:00 -0700
Message-Id: <199608051628.AA3897@localhost> To: shap@eros.cis.upenn.edu
Cc: eros-arch@eros.cis.upenn.edu
Subject: key type issues

I'm thinking again about doing capabilities at Tandem, and I'm wondering what is the best way to handle the key type.

I can think of three different ways to keep the key type information. I'll illustrate with C++ (pseudo-)code.

  1. The type is in the key.

class key {
unsigned type : aFewBits;
object * obj;
...
};

// To invoke a key:
switch (key->type) {
case domain: {
domain * thisDomain = (domain *)(key->obj); switch (orderCode) {
...
}
}
break;

case start: {
domain * thisDomain = (domain *)(key->obj); ...
}
break;

case resume: {
domain * thisDomain = (domain *)(key->obj); ...
}
break;

case domainTool: {
// key->obj is not used.
switch (orderCode) {
...
}
}
break;
}

2. The type of object is in the object, and a subtype is in the key.

union objectSpecificInfo {
struct domainInfo {
enum {startKey, resumeKey, domainKey} domainKeyType; char dataByte;
};
...
};

class key {
object * obj;
objectSpecificInfo info;
...
};

class object {
virtual void invoke(params) = 0;
};

class domain : public object {
virtual void invoke(params);
};

class domainTool : public object {
virtual void invoke(params);
};

domainTool theGlobalDomainTool; // domain tool keys point at this

void domain::invoke(params)
{
switch (key->info.domainKeyType) {
case startKey:

      ...
      break;

    case resumeKey:
      ...
      break;

    case domainKey:
      ...
      break;

}
}

void domainTool::invoke(params)
{
switch (orderCode) {
...
}
}

// To invoke a key:
key->obj->invoke(params);

3. The type is in the object.

class key {
object * obj;
...
};

class object {
virtual void invoke(params) = 0;
};

class startObject : public object {
virtual void invoke(params);
};

class resumeObject : public object {
virtual void invoke(params);
};

class domainObject : public object {
virtual void invoke(params);
};

class domain {
startObject startObj;
resumeObject resumeObj;
domainObject domainObj;
... // registers, etc.
};

void domainObject::invoke(params)
{
domain * thisDomain = this - offsetof(domain, domainObj); switch (orderCode) {
... domain operations
}
}

void domainTool::invoke(params)
{
switch (orderCode) {
...
}
}

// To invoke a key:
key->obj->invoke(params);

Option (1) seems easiest for translating between the on-disk and in-core versions of a key. The other options seem more object-oriented.

I was wondering if you had explored these options to see which performs best.

From LANDAU_CHARLES@tandem.com Tue Aug 6 15:55:22 1996 Return-Path: LANDAU_CHARLES@tandem.com
Received: from suntan.tandem.com (suntan.tandem.com [192.216.221.8]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id PAA03281; Tue, 6 Aug 1996 15:55:20 -0400 From: LANDAU_CHARLES@tandem.com
Received: from localhost by suntan.tandem.com (8.6.12/suntan5.960119)

id MAA23506; Tue, 6 Aug 1996 12:54:52 -0700 Received: by localhost (4.13/4.5)

id AA25401; 6 Aug 96 12:56:49 -0700 Date: 6 Aug 96 12:56:00 -0700
Message-Id: <199608061256.AA25401@localhost> To: shap@eros.cis.upenn.edu
Cc: eros-arch@eros.cis.upenn.edu
Subject: key type issues addendum

In my previous message I specified the use of virtual functions.

Of course, you wouldn't actually use the virtual function call mechanism, since with the advent of multiple inheritance that has gotten slower than what is needed here. You would just store a pointer to the function in the object, or if more than a few functions are needed, a pointer to a table of pointers to functions (like a stripped vft).

From shap@columbia.syncomas.com Wed Aug 7 12:50:46 1996 Return-Path: shap@columbia.syncomas.com
Received: from columbia.syncomas.com (TS6-31.UPENN.EDU [128.91.200.99]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id MAA06286 for <eros-arch@eros.cis.upenn.edu>; Wed, 7 Aug 1996 12:48:05 -0400 Received: (from shap@localhost) by columbia.syncomas.com (8.7.4/8.7.3) id MAA25154; Wed, 7 Aug 1996 12:48:32 -0400 Date: Wed, 7 Aug 1996 12:48:32 -0400
Message-Id: <199608071648.MAA25154@columbia.syncomas.com> From: "Jonathan S. Shapiro" <shap@AURORA.CIS.UPENN.EDU> To: eros-arch@eros.cis.upenn.edu
Subject: Capabilities/Tandem

>I'm thinking again about doing capabilities at Tandem, and I'm wondering
>what is the best way to handle the key type.
>
>I can think of three different ways to keep the key type information.
>I'll illustrate with C++ (pseudo-)code.

The following is predicated on the assumption that you are implementing in C++.

  1. You probably don't want to use virtual functions. The virtual mechanism requires that each key carry a 32-bit key type (the vfn tbl ptr), and that each key *type* have a virtual table. Whether this is worth it depends on the nature of the capabilities you are implementing. The following, to my mind, would dictate use of virtual functions:

o If users need to extend the capability types o If you anticipate a large number of capability types o If capability types do not have a lot of common subcode.

The problem is that derived classes make things that ought to be simple -- like key copy -- a pain in the butt.

In EROS, key types can be grouped into a small number of categories:

	node keys
	page keys
	misc keys
	device keys

Of course there are more, but all of the node key types have a lot of common code (for example fault-in). One problem with the virtual approach is that there is easy way to say "what key type is the current capability" without invoking a function.

In most cases, virtual functions cannot be inlined, no matter how trivial they may be.

2. You probably want to avoid bitfields. Most compilers generate truly cruddy code for them; you will get better code by building a macro set. In particular, most compilers, when comparing a field to a constant, will extract the field rather than shift the constant and mask. For single bit tests (e.g. prepared bit) this is unfortunate.

In the EROS code base, careful placement of fields combined with masking sometimes allows a collection of tests to be accomplished in a single comparison. The algebraic optimization provided by most compilers won't discover these optimizations.

Because of this, I am slowly migrating the EROS code base away from the use of bitfields in favor of hand-coded access macros.

3. If all objects can reasonably made to share a common header (for pages this is difficult), then type-in-object offers some advantages. It lowers the relative overhead of virtual function useage, and simplifies key copy. Devices and miscellaneous keys, IMHO, are unlikely to fall out nicely.

For EROS, what I did was define a KeyBits structure that describes the fields of a key, and derive both DiskKey and Key from that. KeyBits has a number of inline members that are used commonly by DiskKey and Key. The real reason for KeyBits had to do with assignment operators. I wanted the DiskKey class to be public so it could be included by my image construction tools, which meant that it should not depend on Key.hxx (which in turn depends on lots of other things). The common KeyBits class allowed me to pull this off by making things like 'ObjectHeader *' be opaque pointer types.

The relevant chunk of KeyBits follows:

    #define KHAZARD_WRITE     0x40u
    #define KHAZARD_READ      0x80u
    #define KHAZARD_READWRITE 0x80u

class ObjectTable;

    #define UNPREPARED_BIT    0x20u
    #define KHAZARD_BITS     0xC0u
    #define KEYTYPE_BITS    0x1Fu

#define UNPREPARED(x) ((x) | UNPREPARED_BIT) #define PREPARED(x) ((x) & ~UNPREPARED_BIT)

struct KeyBits {
#ifdef BITFIELD_PACK_LOW

      Byte ktByte;
      Byte subType;
      HalfWord keyData;
    #else

#error "verify bitfield layout"
#endif
      // .... rest of definition is a big anonymous union of structures
      // with ugly layout hacks....

};

One other thought, though:

Microsoft has put a lot of work into the ActiveX (previously OCX, previously OLE Controls) technology. An ActiveX control handle is in essence a capability. Among other things it hides whether the control is in-process or remote. They've given a good bit of thought to how to "genericize" the interface. For all of it's problems, this architecture has proven surprisingly flexible, and deserves some consideration.

Jonathan

From shap@eros.cis.upenn.edu Thu Aug 8 16:36:53 1996 Return-Path: shap@eros.cis.upenn.edu
Received: from eros.cis.upenn.edu (localhost [127.0.0.1]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id QAA10018 for <eros-arch@eros.cis.upenn.edu>; Thu, 8 Aug 1996 16:36:53 -0400 Message-Id: <199608082036.QAA10018@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: Re: Capabilities/Tandem
In-reply-to: Your message of "07 Aug 1996 11:33:00 PDT."

<199608071135.AA3897@POST.TANDEM.COM> Date: Thu, 08 Aug 1996 16:36:53 -0400
From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>

In message <199608071135.AA3897@POST.TANDEM.COM>, LANDAU_CHARLES@Tandem.COM wri tes:
> Thanks. This is good stuff.
>
> >The problem is that derived classes make things that ought to be
> >simple -- like key copy -- a pain in the butt.

> I don't understand this. If objects have derived classes, how does that
> affect key copy?

The problem is that usually you have to do some stuff before you can copy safely (e.g. hazard clearance) in a way that is sensitive to key type. Once you have lots of types, you need lots of assignment operators (one per type). You'll soon find that you are using a common base class to subvert this, at which point what was the point of derived key types?

> >If ..., then type-in-object offers some advantages.
> > It lowers the relative overhead of virtual function useage, and
> > simplifies key copy.
>
> I'm lost here too. Relative to what? I can sort-of see that it
> simplifies key copy because the key type does not need to be copied.

Lower overhead because a virtual table pointer per object is smaller space overhead than a virtual table pointer per key.

Simpler copy because implicit in the notion that the dispatch magic is in the object is that it *isn't* in the key; therefore you don't need as many key types. Reducing key types simplifies things like hazard clearance.

>
> >One problem with the virtual
> >approach is that there is easy way to say "what key type is the
> >current capability" without invoking a function.
>
> I think you omitted "no".

Oops! Yes, definitely.

> When you talk about macros for accessing bitfields, I assume inline
> functions would do just as well.

Yes, they would. On the other hand, in C++ the style is to avoid macros, so making it very explicit where the fields are touched by using macros in all caps is sometimes good. Also, macros can sometimes do things that inline funs cannot because the namespace rules for anonymous unions do not interact well with public/private/protected.

> Is your superfast IPC code at a point where you would share it with me?

Hmm. In principle you're welcome to it. Ther are a couple of bugs in the current implementation that I'ld like to iron out before it really gets distributed. If your need is urgent, let me know and I'll clean it up soon. I was planning to de-diskify EROS and port to another machine first.

> Where can I learn more about ActiveX?

Best place to start is with the OLE SDK documentation. Since there is currently a beta of a new SDK out, you might be able to suck it down from www.microsoft.com via a web browser.

Jonathan

From shap Mon Sep 9 16:24:55 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id QAA10494; Mon, 9 Sep 1996 16:24:55 -0400 Date: Mon, 9 Sep 1996 16:24:55 -0400
Message-Id: <199609092024.QAA10494@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: with some trepidation...

I've been looking at the issues in getting X11 running under EROS. It's a bleeping mess. The essential problem is that for reasons of old hardware restrictions the video buffer cannot simply be memory mapped - many video cards implement a 128k wide window onto the video memory at physaddr 0xA000. The window is shifted by telling the card via card control registers that are mapped to well-known IO ports.

The problem is that for X11 (or any graphics interface) to work, EROS must support user-level application direct access to ports. While we could do a callable capability for this, the overhead would be very visible in scrolling operations. This means that we want to introduce I/O port address spaces.

The cheap quick hack would be to allow all user-level drivers access to all I/O ports. The problem is that this essentially gives drivers access to all of physical memory (some cards have DMA, all DMA on this machine is to physical addresses). Note that restricting the port space doesn't help - the driver for any card with physdma must necessarily be inside the trust boundary.

Previously, we have said that only the kernel could see the key bits. How bad is it to allow driver code to potentially see them as well, and declare that drivers are part of the TCB? That is, I don't think I fully appreciate the consequences for the security arguments. It seems to me (naively) that in-kernel drivers can do anything and therefore must be trusted, and removing them from the kernel alters nothing w.r.t. trust issues.

The *right* thing to do is implement a central, trusted agent responsible for startup-time autoconfig and construction of I/O address spaces, and make I/O space be non-writable to the driver (the I/O space is an enable bitmask) and rescinded on restart. The only purpose of doing the easy thing now is to reduce the immediate workload. This allows us to move *some* drivers outside the TCB, which is a good thing.

We already need (for some work at Penn) support for memory-resident objects that are outside the persistence contract. If allowing user-level access to I/O ports is acceptable in principle, then *all* drivers (including disk drivers) can be handled at user level. This would be nice, because it makes them easier to debug, which would increase the speed with which we can bring up drivers for other platforms. It would also simplify the scheduler and thread support.

It would, however, have the effect of exposing the bits of the keys to a restricted class of domains.

Also, there is a question about how access to the object cache should be handled for the ageing daemon and drivers on machines that lack atomic bit set/reset operations. My initial thought was just to memory map the object cache into the pageout daemon.

Jonathan

From shap@halifax.syncomas.com Tue Sep 10 21:19:05 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS10-38.UPENN.EDU [128.91.202.59]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id VAA14758 for <eros-arch@eros.cis.upenn.edu>; Tue, 10 Sep 1996 21:19:03 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id VAA00289; Tue, 10 Sep 1996 21:20:53 -0400 Date: Tue, 10 Sep 1996 21:20:53 -0400
Message-Id: <199609110120.VAA00289@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: Resolution: X11, I/O ports, and such

Turns out that there is a significant problem with allowing user-level apps direct access to port spcae.

The problem is that the I/O port space is really the registers of some piece of hardware. Many commands (e.g. area fill) given by the graphics system are multibyte or multiword sequences written in some well-defined sequence. Because of this, one cannot safely multiplex the display -- if the multibyte operations happen to get interrupted in the middle, the hardware state machine is in an unknown state.

For smart devices where low latency is required, you can't multiplex without the connivance of the application.

Rather than try to address this with a carefully thought-out solution, I'm going to propose that we do something else entirely:

  1. (short term) donate the console to X11 and don't to any console session management at all.
  2. (longer term) stick a bolt-on onto the X11 server by which we can request that the currently active server voluntarily relinquish the display at a well-defined place.

Also, for the moment, I'm going to declare that our windowing systems will use the video system as a relatively dumb frame buffer whose video mode is known to (and can be restored by) the kernel.

Mike Laskin will be doing the X11 port.

The I/O state machine is also a problem for other drivers. Many devices in the PC world transfer data via IO ports, and very few of the associated hardware designs can simultaneously be written to and read from (i.e. the hardware is single threaded). The problem, then, is enforcement of mutual exclusion.

When I first thought about this, I figured that one could do some sort of upcall hack that waited for an interrupt while leaving the driver available for downward traffic.

The problem is that interrupt lines are not exclusively allocated, and we do not wish to lose them. This in turn means that the kernel must at a minimum contain a sufficient filter to determine which device produced the interrupt. Since such an interrupt classifier is pretty close to the entire low-level driver... Also, on the x86, user-level classification would be too slow.

Somebody should look at strategies for interrupt classification someday.

Sigh. So much for that idea.

Jonathan

From shap@halifax.syncomas.com Tue Sep 10 22:17:08 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS10-27.UPENN.EDU [128.91.202.48]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id WAA14984 for <eros-arch@eros.cis.upenn.edu>; Tue, 10 Sep 1996 22:17:06 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id WAA00441; Tue, 10 Sep 1996 22:18:56 -0400 Date: Tue, 10 Sep 1996 22:18:56 -0400
Message-Id: <199609110218.WAA00441@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: Further on X11

I cannot remember if I said this in the last mail.

For now, in the interest of simplicity, we should just do the dumb frame buffer version of X11, which eliminates the state machine problems in the short term.

shap

From shap@halifax.syncomas.com Fri Sep 13 15:27:35 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS3-40.UPENN.EDU [128.91.200.169]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id PAA04481 for <eros-arch@eros.cis.upenn.edu>; Fri, 13 Sep 1996 15:27:33 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id PAA01224; Fri, 13 Sep 1996 15:29:08 -0400 Date: Fri, 13 Sep 1996 15:29:08 -0400
Message-Id: <199609131929.PAA01224@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: Notes on checkpoint

Some rough cut notes on the circular log checkpoint strategy can be found via http://www.cis.upenn.edu/~eros via the design notes page. Anybody who feels like commenting is welcome.

From shap Wed Sep 18 10:23:37 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id KAA12036; Wed, 18 Sep 1996 10:23:35 -0400 Date: Wed, 18 Sep 1996 10:23:35 -0400
Message-Id: <199609181423.KAA12036@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: Duplexing problem

I just stumbled into an "oops", and I don't see a graceful solution. Normally, a multiple failure mode like this wouldn't bother me much, but there is a difference between a multiple mode failure that stops your machine from working and a multiple mode failure that scrags your machine forever.

The failure in question concerns duplex recovery.

Problem:

One drive fails recoverably (e.g. SCSI cable becomes disconnected), with the consequence that it's ranges are not updated by the checkpoint mechanism. It's ranges are duplexed, so everything continues to run fine.

In reconnecting the failed drive, we manage to knock out the cable from some drive containing one of the duplexes.

On restart, the system can tell that the impacted range (the one on the drive that is working again) is out of date because it's associated last checkpoint sequence number is too low.

If the duplex copy is present, and has a proper checkpoint sequence number, we know what happened and can simply bring the stale copy up to date without further concern.

In the absence of the duplex copy (the up to date one), it is not clear how to distinguish between the following scenarios:

+ the range was disconnected due to failure and needs to be brought up to date.

+ the range was deliberately disconnected in an orderly fashion

     (e.g. due to removable media) and is in fact up to date, the
     checkpoint sequence number discrepancy notwithstanding.

We cannot just go ahead and accept the suspicious duplex. It's state may not be consistent with the state of the rest of the system, and once we compute using it we can't recover from the *good* copy anymore.

I can see two possible solutions:

  1. Add a bit to the range header indicating that it was dismounted due to an orderly dismount.
  2. Don't mount suspicious ranges and leave it to user level software.

The problem with (1) is what happens if the *removable* range is duplexed. Now you need all duplexes mounted in order to do an orderly dismount on such a range.

Suggestions?

Jonathan

From frantz@netcom.com Thu Sep 19 00:46:13 1996 Return-Path: frantz@netcom.com
Received: from netcom8.netcom.com (netcom8.netcom.com [192.100.81.117]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id AAA09862; Thu, 19 Sep 1996 00:46:13 -0400 Received: from [204.31.236.194] (sjx-ca37-02.ix.netcom.com [204.31.236.194]) by netcom8.netcom.com (8.6.13/Netcom)

id VAA04576; Wed, 18 Sep 1996 21:46:20 -0700 Message-Id: <199609190446.VAA04576@netcom8.netcom.com> X-Sender: frantz@netcom8.netcom.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Wed, 18 Sep 1996 21:49:28 -0700
To: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>,

eros-arch@eros.cis.upenn.edu
From: frantz@netcom.com (Bill Frantz)
Subject: Re: Duplexing problem

Consider storing in the checkpoint, the current version ID of all the "permanently mounted" ranges. Then have a user-level mount manager which stores similar information for all the removable media.

Aren't CDROMs wonderful :-).


Bill Frantz       | "Cave softly, cave safely, | Periwinkle -- Consulting
(408)356-8506     | and cave with duct tape."  | 16345 Englewood Ave.
frantz@netcom.com |           - Marianne Russo | Los Gatos, CA 95032, USA



From shap Wed Oct 2 17:31:21 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id RAA27529; Wed, 2 Oct 1996 17:31:15 -0400 Date: Wed, 2 Oct 1996 17:31:15 -0400
Message-Id: <199610022131.RAA27529@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: Gate keys, contexts, and gotchas

A month or two ago I made a design change to gate keys: prepared gate keys point to the context structure rather than the domain root. A small (solvable) gotcha has now emerged that I want to describe so it will be included in the archive. The problem arises in keeper invocation.

One of the conditions under which a keeper is invoked is if a domain is malformed. In the current EROS design, if an attempt is made to prepare a node as a domain root, the logic goes like this:

	attempt to prepare domain
          -- may not alter fault code if one already exists.  If
	     domain has OK fault code, may set fault code to
	     FC_Malformed.

        if domain is faulted, and has a keeper key, invoke the keeper.

Note that the keeper must in some cases receive a restart key to the faulted domain, and that this restart key when prepared must point to the domain's context structure.

Implications:

  1. No matter how malformed the domain is, it must be possible to fabricate a well-defined context structure for it. If this cannot be done the restart key cannot be prepared, and the malformed domain cannot be restarted.

A context associated with a malformed domain may respond to certain requests (e.g. read/write registers in an annex node) with a failed result.

2. A context is now always constructable given a domain root, which

means that more operations can be designed to use the context structure.

A complexity is that the keeper invocation needs to pass certain information to the keeper, and this information does not originate in the alleged caller's register set. Indeed, if the domain is sufficiently malformed the necessary REGISTERS may not exist within the domain (unlikely, but I can conceive of machines where the registers are large enough that the registers describing the invocation will not fit within the domain root).

In the interest of common code, then, the kernel implementation of the INVOKE() operation is NOT of the form

invoke: context X context => {}

but rather of the form

invoke: invocation X context => {}

Hopefully, things can be designed in such a way that the invocation structure is usually an overlay on the source context structure, eliminating the need to copy the values.

I'll need to hie myself off and reimplement the invocation logic to do this.

Jonathan

From shap@halifax.syncomas.com Fri Oct 4 17:00:12 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS5-5.UPENN.EDU [128.91.200.69]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id RAA08279 for <eros-arch@eros.cis.upenn.edu>; Fri, 4 Oct 1996 17:00:09 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id RAA27439; Fri, 4 Oct 1996 17:01:56 -0400 Date: Fri, 4 Oct 1996 17:01:56 -0400
Message-Id: <199610042101.RAA27439@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: More implications of gate keys

Following up on my previous mail...

Now that gate keys now point to contexts when prepared, we need to ensure that it is always possible to build a context given only a domain root. Failure to have suitable annex nodes or suitable keys in the the right slots must not prevent context fabrication. Instead, they become another type of hazard that can render a context non-runnable.

This isn't as bad as it first seemed to me. In essence, I'm proposing that the logic previously embodied in the algorithm for preparing a node as a domain root should now be embodied in preparing a context to run. This relocation has the following advantages:

+ The number of places that a domain can be marked as faulted is now reduced to context (KeyKOS: DIB) preparation and the trap handlers (which in turn set it by calling context logic).

+ There is now only one place (the context hazard bitset) that needs to be checked to see if a domain is runnable. This place is already short-circuited.

+ It is slightly easier to terminate the recursion that arises when a keeper proves to be malformed.

+ Keeper invocation in general is slightly simplified.

Most of the operations previously done on domain root migrate to be done on the context structure -- where possible on the machine-independent portion of this structure. Once again, the centralization of the hazard mask renders things a bit easier.

Where previously one would call:

	domainKey.Prepare();
	domainKey.pNode->PrepareAsDomain();
	ArchContext *ctxt = domainKey.pNode->GetDomainContext();
	if ( ctxt->IsRunnable() )
	  ...

Now one calls:

	domainKey.Prepare();
	if ( domainKey.pContext->IsRunnable() )
	  ...

It is already the case that there is machine-independent logic associated with the context code, so moving the logic of domain preparation into the context code does not increase machine specific code. Actually, the domain structure always was more machine specific than we wanted to admit (e.g. sparc requires more annexes).

For a node to have NtDomainRoot as it's prepType now implies solely that a context structure might exist for this node. It no longer means that the domain is well-formed. If the domain is NOT well formed, the context will not be runnable, which was a check we already needed to make in any case.

If a node is prepared as prepType==NtAnnex or prepType == NtKeyRegs, it implies that

+ there exists a node whose prepType==NtDomainRoot that currently has a context structure
+ that domain root names this node as a general keys node or a registers annex node.

it does *not* imply that the domain is well-formed.

Basically, the real simplification is that context preparation already has a lot of logic enabling it to fail selectively. Duplicating that logic elsewhere seems unnecessary.

Jonathan

From norm@MediaCity.com Sat Oct 5 20:06:31 1996 Return-Path: norm@MediaCity.com
Received: from MediaCity.com (easy1.mediacity.com [205.216.172.10]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id UAA12186; Sat, 5 Oct 1996 20:06:30 -0400 Received: (from norm@localhost) by MediaCity.com (8.6.11/8.6.9) id PAA05808; Sat, 5 Oct 1996 15:37:56 -0700 Message-Id: <ae7c6a62000210041b46@DialupEudora> Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Sat, 5 Oct 1996 15:41:59 -0700
To: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>,

eros-arch@eros.cis.upenn.edu
From: norm@netcom.com (Norman Hardy)
Subject: Floating point state divorced from Domains

I see no problems immediately with the proposal to produce a context structure for a malformed domain. Indeed it moves towards a quirky idea that I like.

I realize that the some of the advantages claimed below can be had with a less radical proposal but I suspect that the less radical proposals provide fewer features and do not require less code.

Introduce a new kernel object called a floating register set. It would be composed of nodes much like a domain, however many necessary to hold floating registers. This makes sense on machines where it is possible to run user code while denying access to the floating point hardware registers. Domains that need to execute floating point code would be equipped with a floating register set. Indeed domains could share floating register sets because this is easier to allow than to prevent and it is even occasionally useful (see below). In this design there is no such thing as a domain malformed for lack of floating registers. For many floating point domains, for most time slices of that domain, the domain invokes no floating point commands and the floating point state need not be loaded. Indeed the floating state of the domain may well be on disk. There is a new kernel structure that holds floating register values suitable for quick transfer between the structure and the real hardware. There is a global variable pointing to the structure that owns the state currently held by the real hardware. If a new domain requires use of the real floating point hardware and that hardware currently holds someone's state, then the old hardware state can by quickly returned to its owner. More likely the state's owner will be the next domain to execute floating point commands and voila, compare the pointers and enable the floating hardware and you are off. The heavy floating point domains can call other domains with no floating point context switching. A domain keeper attending to floating point faults can share the floating point state to more quickly access the troublesome state, again with no saving and restoring cost.

Indeed it might be possible to introduce the idea of a floating point keeper that attends to floating point corner conditions. I wish we had done this on the 88K where rare floating point situations caused traps which were handled by kernel code. On the 88K there was privileged floating point state where the floating point state was delivered to the code that handled the rare floating point cases. Newer CPUs handle more or all of the difficult cases (denormalized numbers) automatically.

Note that in this scheme the floating register set is an architected thing while the "new kernel structure that holds floating register values" is an internal optimization.

From norm@MediaCity.com Sat Oct 5 20:06:30 1996 Return-Path: norm@MediaCity.com
Received: from MediaCity.com (easy1.mediacity.com [205.216.172.10]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id UAA12184; Sat, 5 Oct 1996 20:06:29 -0400 Received: (from norm@localhost) by MediaCity.com (8.6.11/8.6.9) id PAA06005; Sat, 5 Oct 1996 15:43:12 -0700 Message-Id: <ae7c6a62000210041b46@DialupEudora> Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Sat, 5 Oct 1996 15:46:55 -0700
To: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>,

eros-arch@eros.cis.upenn.edu
From: norm@netcom.com (Norman Hardy)
Subject: Floating point state divorced from Domains

I see no problems immediately with the proposal to produce a context structure for a malformed domain. Indeed it moves towards a quirky idea that I like.

I realize that the some of the advantages claimed below can be had with a less radical proposal but I suspect that the less radical proposals provide fewer features and do not require less code.

Introduce a new kernel object called a floating register set. It would be composed of nodes much like a domain, however many necessary to hold floating registers. This makes sense on machines where it is possible to run user code while denying access to the floating point hardware registers. Domains that need to execute floating point code would be equipped with a floating register set. Indeed domains could share floating register sets because this is easier to allow than to prevent and it is even occasionally useful (see below). In this design there is no such thing as a domain malformed for lack of floating registers. For many floating point domains, for most time slices of that domain, the domain invokes no floating point commands and the floating point state need not be loaded. Indeed the floating state of the domain may well be on disk. There is a new kernel structure that holds floating register values suitable for quick transfer between the structure and the real hardware. There is a global variable pointing to the structure that owns the state currently held by the real hardware. If a new domain requires use of the real floating point hardware and that hardware currently holds someone's state, then the old hardware state can by quickly returned to its owner. More likely the state's owner will be the next domain to execute floating point commands and voila, compare the pointers and enable the floating hardware and you are off. The heavy floating point domains can call other domains with no floating point context switching. A domain keeper attending to floating point faults can share the floating point state to more quickly access the troublesome state, again with no saving and restoring cost.

Indeed it might be possible to introduce the idea of a floating point keeper that attends to floating point corner conditions. I wish we had done this on the 88K where rare floating point situations caused traps which were handled by kernel code. On the 88K there was privileged floating point state where the floating point state was delivered to the code that handled the rare floating point cases. Newer CPUs handle more or all of the difficult cases (denormalized numbers) automatically.

Note that in this scheme the floating register set is an architected thing while the "new kernel structure that holds floating register values" is an internal optimization.

From shap@halifax.syncomas.com Sat Oct 5 22:24:13 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS6-21.UPENN.EDU [128.91.200.89]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id WAA12587 for <eros-arch@eros.cis.upenn.edu>; Sat, 5 Oct 1996 22:24:10 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id WAA31828; Sat, 5 Oct 1996 22:25:38 -0400 Date: Sat, 5 Oct 1996 22:25:38 -0400
Message-Id: <199610060225.WAA31828@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: Floating point state divorced from Domains In-reply-to: Your message of "Sat, 05 Oct 1996 15:41:59 PDT."

<ae7c6a62000210041b46@DialupEudora>

Norm and I have talked about some aspects of this before. On some processors, I think it's quite a good idea.

On the x86, the entire floating point state is 108 bytes. It's not clear if the overhead of separate structure is warranted. One advantage to seperate structure is the ability to leave floating point state cached in the FP coprocessor rather than in the Context (DIB) structure.

This raises issues on an MP machine: scheduling a process to run may require flushing the state off of some other processor. Most x86 OS's have taken the view that you always save the FP state when descheduling a process; you just don't reload it unnecessarily. The problem is that interprocessor signalling is pretty primative. Manageable, but primatice.

A couple of other points:

> Note that in this scheme the floating register set is an architected
> thing while the "new kernel structure that holds floating register
> values" is an internal optimization.

Even if this is an architected feature, it remains architecture specific. It need not be done on processors where the FP state is small.

> It might be possible to introduce the idea of a floating point
> keeper that attends to floating point corner conditions.

This is worth exploring, but I think there are two distinct issues to consider:

+ Floating point corner conditions that are implementation deficiencies which the hardware should, in principle have handled. These are, in effect, emulated instructions.

+ Floating point corner conditions that have useful application-specific behavior.

Emulated instructions can be handled by the OS or by a dedicated, architecture-specific domain known to the OS. Application specific cases can be handled by the domain keeper, or by a common service domain invoked by the domain keeper. Unless there is compelling reason for a separate keeper I am reluctant to implement one.

Jonathan

From shap@halifax.syncomas.com Sun Oct 6 02:19:43 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS7-47.UPENN.EDU [128.91.201.33]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id CAA18924 for <eros-arch@eros.cis.upenn.edu>; Sun, 6 Oct 1996 02:19:41 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id CAA16298; Sun, 6 Oct 1996 02:21:05 -0400 Date: Sun, 6 Oct 1996 02:21:05 -0400
Message-Id: <199610060621.CAA16298@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: and yet another nasty context bit

It is not sufficient that a context be constructable from a domain root. A context must be constructable from a MALFORMED domain root. In particular, it must be constructable from a domain root that does not have a number key in the trap code slot and therefore cannot legally carry a fault code. Note that FC_MalformedDomain is itself a fault code, introducing a quandry of where it can legitemately be stored.

Resolution: Such a domain by definition has FC_MalformedDomain as it's fault code. This code need not be stored back to the domain root, as it is trivially rediscoverable (the absence of the number key is a sufficient condition).

This in turn raises a subsidiary nit. If a domain that is faulted on a non-restartable instruction becomes unrunnable due to becoming malformed (e.g. address space removed) or especially by virtue of lacking a number key in the trap code slot, the information concerning the in-progress execution fault is lost and the execution state of the domain's virtual processor becomes undefined. For restartable instructions this is not a problem.

Principle: a domain must in principle be runnable in order for it to rationally be undergoing an execution fault. If it ceases in principle to be runnable by virtue of misapplication of the domain key, then it's execution fault state becomes undefined.

In summary, if you want your domains to behave as though they were whole, don't break them. Note that a domain whose fault code is FC_MalformedDomain cannot properly have any other fault code under anything approximating normal circumstances (e.g. from an exception).

Jonathan

From shap Sun Oct 6 02:25:20 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id CAA18970 for eros-arch; Sun, 6 Oct 1996 02:25:19 -0400 Date: Sun, 6 Oct 1996 02:25:19 -0400
From: "Jonathan S. Shapiro" <shap>
Message-Id: <199610060625.CAA18970@eros.cis.upenn.edu> To: eros-arch
Subject: testing - please ignore

I'm trying to figure out why messages are getting duplicated

From shap Sun Oct 6 02:31:45 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id CAA19040 for eros-arch; Sun, 6 Oct 1996 02:31:44 -0400 Date: Sun, 6 Oct 1996 02:31:44 -0400
From: "Jonathan S. Shapiro" <shap>
Message-Id: <199610060631.CAA19040@eros.cis.upenn.edu> To: eros-arch
Subject: one more test

From shap Mon Oct 7 09:56:53 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id JAA18170; Mon, 7 Oct 1996 09:56:52 -0400 Date: Mon, 7 Oct 1996 09:56:52 -0400
Message-Id: <199610071356.JAA18170@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: keykos behavior question

If domain A performs a FORK on a kernel-implemented key, passing as input in the 4th key slot a gate key, what happens?

  1. If the gate key is a return key
  2. If the gate key is a start key

I should think that the kernel should be viewed logically as performing a return to the key in slot 4, regardless of how that key was generated.

Is this in fact what KeyKOS did?

I'll check the Gnosis manual, but I find that the horses' mouths are easier to comprehend... :-)

Jonathan

From LANDAU_CHARLES@Tandem.COM Mon Oct 7 13:20:42 1996 Return-Path: LANDAU_CHARLES@Tandem.COM
Received: from suntan.tandem.com (suntan.tandem.com [192.216.221.8]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id NAA18680; Mon, 7 Oct 1996 13:20:37 -0400 From: LANDAU_CHARLES@Tandem.COM
Received: from POST.TANDEM.COM by suntan.tandem.com (8.6.12/suntan5.960905)

id KAA22207; Mon, 7 Oct 1996 10:19:57 -0700 Received: by POST.TANDEM.COM (4.13/4.5)

id AA8505; 7 Oct 96 10:20:02 -0700 Date: 7 Oct 96 10:18:00 -0700
Message-Id: <199610071020.AA8505@POST.TANDEM.COM> To: shap@eros.cis.upenn.edu
Cc: eros-arch@eros.cis.upenn.edu
Subject: Re: keykos behavior question

Logically, the kernel returns to the key in slot 4 via the returner. If the key in slot 4 is not a resume key, it is ignored.

This restriction is made to keep the kernel programmers sane. If we allowed gate keys, we would have to allow all keys, including other kernel keys. You can imagine the difficulty in implementing this, and it would cost everyone, for a case that has very limited utility.

From shap@eros.cis.upenn.edu Mon Oct 7 13:24:36 1996 Return-Path: shap@eros.cis.upenn.edu
Received: from eros.cis.upenn.edu (localhost [127.0.0.1]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id NAA18710; Mon, 7 Oct 1996 13:24:35 -0400 Message-Id: <199610071724.NAA18710@eros.cis.upenn.edu> To: LANDAU_CHARLES@Tandem.COM
cc: shap@eros.cis.upenn.edu, eros-arch@eros.cis.upenn.edu, shap Subject: Re: keykos behavior question
In-reply-to: Your message of "07 Oct 1996 10:18:00 PDT."

<199610071020.AA8505@POST.TANDEM.COM> Date: Mon, 07 Oct 1996 13:24:34 -0400
From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>

Charlie:

I'm in the middle of reimplementing the gate code, so I really want to make sure I have this clear. I'm pretty sure this is what you said:

If I do a FORK or RETURN to the kernel passing a resume key to someone else in slot 4, the kernel activates the domain named by the resume key?

Jonathan

From shap@eros.cis.upenn.edu Mon Oct 7 13:55:12 1996 Return-Path: shap@eros.cis.upenn.edu
Received: from eros.cis.upenn.edu (localhost [127.0.0.1]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id NAA18969; Mon, 7 Oct 1996 13:55:12 -0400 Message-Id: <199610071755.NAA18969@eros.cis.upenn.edu> To: LANDAU_CHARLES@tandem.com
cc: eros-arch
Subject: Re: keykos behavior question
In-reply-to: Your message of "07 Oct 1996 10:49:00 PDT."

<199610071049.AA22841@MAILMN.mis.tandem.com> Date: Mon, 07 Oct 1996 13:55:12 -0400
From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>

Oops. I forgot something.

I know that if the keeper is malformed at the time of the invocation the whole thing stops there. I propose to put the faulting domain on the object sleep queue associated with the keeper. If the keeper is later repaired the fault will be processed at that time.

But there is a related case that seems less obvious to me:

A sends a message to B. B has a bad receive buffer, and promptly faults. B's keeper also has a bad receive buffer. Does the fault recurse?

I see no reason why it should not -- the B-keeper (grin) has received the message, which is distinct from being malformed.

Jonathan

From shap Mon Oct 7 15:46:52 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id PAA19362; Mon, 7 Oct 1996 15:46:51 -0400 Date: Mon, 7 Oct 1996 15:46:51 -0400
Message-Id: <199610071946.PAA19362@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: Gate jump assumption

I am proceeding on the assumption that the keys specified in the entry block of a gate jump are not modified unless they are also specified in the exit block of that gate jump.

That is, the fabrication of the resume key in the CALL operation does not mutate the caller key registers.

I have not found an explicit statement concerning this in the gnosis document.

shap

From LANDAU_CHARLES@Tandem.COM Mon Oct 7 17:52:29 1996 Return-Path: LANDAU_CHARLES@Tandem.COM
Received: from suntan.tandem.com (suntan.tandem.com [192.216.221.8]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id RAA22204; Mon, 7 Oct 1996 17:52:27 -0400 From: LANDAU_CHARLES@Tandem.COM
Received: from POST.TANDEM.COM by suntan.tandem.com (8.6.12/suntan5.960905)

id OAA24841; Mon, 7 Oct 1996 14:51:53 -0700 Received: by POST.TANDEM.COM (4.13/4.5)

id AA825; 7 Oct 96 14:51:58 -0700 Date: 7 Oct 96 14:49:00 -0700
Message-Id: <199610071451.AA825@POST.TANDEM.COM> To: shap@eros.cis.upenn.edu
Cc: eros-arch@eros.cis.upenn.edu
Subject: Re: keykos behavior question

>A sends a message to B. B has a bad receive buffer, and promptly faults. B's keeper also has a bad receive buffer. Does the fault recurse?

Yes. But it's really an iteration rather than a recursion. At each iteration, we reconsider what domain to run. It won't block high priority process or stay in the kernel a long time. "Running" a faulted domain calls its keeper.

The iteration terminates because the number of keeper keys to ready domains keeps decreasing.

From shap Thu Oct 10 15:43:36 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id PAA14236; Thu, 10 Oct 1996 15:43:34 -0400 Date: Thu, 10 Oct 1996 15:43:34 -0400
Message-Id: <199610101943.PAA14236@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: A couple of trivia questions

  1. Behavior of sleep key:

If A does a FORK/RETURN on the sleep (timer) key, passing a resume key to B, then what happens? I can make sense of this all right. The tricky part is that the test to ensure that B is still waiting at wakeup time has to be done at wakeup time rather then when the sleep key is invoked.

The simplest thing for now is probably just not to allow this and require the invocation to be a CALL.

2. If A does a gate jump to B, and B is marked as a foreign domain

(e.g. a UNIX domain), my current plan is to allow the gate jump to proceed, but suppress the transfer of the data, keys, order code, and key data (data byte). In short, the jump may only alter the domain state.

Anybody see a problem with this?

Heck, it's not real clear what it means to do this anyway.

Jonathan

From shap@eros.cis.upenn.edu Fri Oct 11 08:51:10 1996 Return-Path: shap@eros.cis.upenn.edu
Received: from eros.cis.upenn.edu (localhost [127.0.0.1]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id IAA19042; Fri, 11 Oct 1996 08:51:10 -0400 Message-Id: <199610111251.IAA19042@eros.cis.upenn.edu> To: frantz@netcom.com (Bill Frantz)
cc: eros-arch
Subject: Re: A couple of trivia questions In-reply-to: Your message of "Thu, 10 Oct 1996 13:24:37 PDT."

<199610102021.NAA23625@netcom8.netcom.com> Date: Fri, 11 Oct 1996 08:51:09 -0400
From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>

In message <199610102021.NAA23625@netcom8.netcom.com>, Bill Frantz writes:
> What's the problem? If A has a resume key to B, then B must be waiting.
> If some domain key holder to B changes that situation, then the resume key
> must be deleted. Then the timer service (in the kernel) holds a zero data
> key and it "returns" to that zero data key with well defined results.

The problem is that the kernel cannot hold keys in this way.

> >2. If A does a gate jump to B, and B is marked as a foreign domain
> > (e.g. a UNIX domain), my current plan is to allow the gate jump to
> > proceed, but suppress the transfer of the data, keys, order code,
> > and key data (data byte). In short, the jump may only alter the
> > domain state.
> >
> > Anybody see a problem with this?
> >
> That would be like the key to B was the resume key generated by meter
> exhaustion.

I'm not clear if you intend this statement to describe a problem or not.

> > Heck, it's not real clear what it means to do this anyway.
>
> It is one way to fire off a UNIX etc. domain. I just don't see why you
> introduce a new domain type for this purpose. Controlling the keys should
> be sufficent to avoid problems, and we found that it was useful to have
> CMS, at least programs issue gate jumps.

A "foreign" domain is a new domain type.... CMS, in your example, is not running as a foreign domain.

From shap Fri Oct 11 11:03:12 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id LAA19374; Fri, 11 Oct 1996 11:03:11 -0400 Date: Fri, 11 Oct 1996 11:03:11 -0400
Message-Id: <199610111503.LAA19374@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: gate actions on malformed domains

Problem: If an IPC is done to a malformed domain, what is the state of the invokee? I am trying to decide if we need to reify malformed as a fourth state. The problem is that if the domain is malformed enough, I probably cannot believe that the run state field is meaningful.

Following is an ITIBIS outline of the options. A short explanation of ITIBIS:

	I	indicates an issue
	P	indicates a position
	AS	indicates a supporting argument for a position
	AO	indicates an opposing argument.

	P can further be annotated by

	   '?'  Resolution on this position has not been made.
           '-'  Position determined unfeasible
	   '+'  Position adopted.

        Positions can be adopted even if others are unresolved.  The
        idea is to keep the entire argument structure for later
        review.


ITIBIS on invocation of malformed domains:

I : Should a malformed domain be permitted to have outstanding resume

keys?

P?: Outstanding resume keys should be zapped by any action that

causes a domain to become malformed.

        AS: This approach unifies all of the remaining acceptable
            cases above, since in the absence of resume keys it
            doesn't matter much what the resume key action is.

        AO: It doesn't work, as in this design a malformed domain
            would be unable to generate the resume key necessary to
            invoke it's keeper.

        AS: The keeper invocation isn't a problem, as a malformed
            domain can never be running and therefore can never invoke
            it's keeper in the first place.

P-: A resume key invoked on a malformed domain should block

        AS: Domain is malformed, state of domain is undefined.
        AO: Resume keys must be prompt
        AO: This design causes the domain keeper to block in some
            cases, which is not acceptable.

    P-: A resume key invoked on a malformed domain should act as a
        restart key.

	I:  Do domain keepers need this?
            AS: It is okay if resume key acts as data key (see below).
                It is not okay if resume keys block.

        AO: Domain is malformed, what does it mean to restart one?

    P+: A resume key invoked on a malformed domain should act as a
        number key, as per KeyKOS.

	AS: Resolves domain keeper problem.
        I : It seems unfortunate to defer discovery of bad resume
            keys, as the behavior of a resume key invoked on a
            temporarily malformed domain now depends on a timing
            window.

            P : This is really an argument in favor of supporting
                operations that atomically (w.r.t gates) alter the
                slots.  If we support non-atomic operations this
                problem is inevitable.

I : Setting aside the question of gate key behavior, what is the

'state' (running, available, waiting) of a malformed domain?

P-: Malformed domain is running

        AO: Running is defined as occupied by a thread.  The domain is
            malformed and therefore cannot be occupied by a thread.
        AS: Such a domain must run at least temporarily in order to
            invoke it's keeper.

P-: Malformed domain is waiting

        AS: Consistent with resume key behavior described above.
        AO: Cannot be invoked by a start key, therefore cannot be
            set running, therefore cannot invoke it's keeper.

P-: Malformed domain is available

AO: Cannot be resumed by keeper.

P+: Doesn't matter -- cannot be invoked anyway.

From shap Fri Oct 11 11:37:54 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id LAA19492; Fri, 11 Oct 1996 11:37:53 -0400 Date: Fri, 11 Oct 1996 11:37:53 -0400
Message-Id: <199610111537.LAA19492@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: Keykos gate key behavior

The Gnosis doc says that a gate key to a malformed domain acts like a number key *except that no string is transferred*.

The "acts like a number key" is consistent with other code. The "no string is transferred" introduces extra code into the gate path. Is ther any good reason not to just let it act like a zero data key?

shap

From shap@halifax.syncomas.com Sat Oct 12 15:41:37 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS8-58.UPENN.EDU [128.91.201.105]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id PAA26366 for <eros-arch@eros.cis.upenn.edu>; Sat, 12 Oct 1996 15:41:35 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id PAA28129; Sat, 12 Oct 1996 15:42:26 -0400 Date: Sat, 12 Oct 1996 15:42:26 -0400
Message-Id: <199610121942.PAA28129@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: sleep key implementation

In light of the challenges involved in correctly implementing FORK on the sleep keyu, I have reimplemented this key as follows:

Invoking the sleep key causes the *invoker* to sleep for the specified time, after which the kernel will return to the invokee as via the returner.

In practice, this is implemented by having the kernel place the invoker to sleep, and then reach up and overwrite the invoker order code to be the "wake me up now" order code. Thus, the sleep key visibly alters the order code register during the time that the invoker is asleep.

Frankly, I consider this hack to be too gross for words, but all of the other solutions I was able to come up with were worse.

It does, however, raise a question: should a FORK operation return an "ok" result code, or should it return nothing? In the current EROS implementation, the invoker exit block is not altered by FORK.

Jonathan

From shap Tue Oct 15 16:40:26 1996
Return-Path: shap
Received: (from shap@localhost) by eros.cis.upenn.edu (8.7.4/8.7.3) id QAA26031; Tue, 15 Oct 1996 16:40:25 -0400 Date: Tue, 15 Oct 1996 16:40:25 -0400
Message-Id: <199610152040.QAA26031@eros.cis.upenn.edu> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch
Subject: resume key question

Is there a need to distinguish between a restart key and a fault key? The seg keeper gets a fault key. Why shouldn't the domain keeper also get a fault key and be trusted to use OK as the invocation code?

shap

From frantz@netcom.com Thu Oct 17 00:26:02 1996 Return-Path: frantz@netcom.com
Received: from netcom6.netcom.com (netcom6.netcom.com [192.100.81.114]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id AAA04083; Thu, 17 Oct 1996 00:26:01 -0400 Received: from [199.182.128.113] (sjx-ca11-17.ix.netcom.com [199.182.128.113]) by netcom6.netcom.com (8.6.13/Netcom)

id VAA26653; Wed, 16 Oct 1996 21:25:54 -0700 Message-Id: <199610170425.VAA26653@netcom6.netcom.com> X-Sender: frantz@netcom6.netcom.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Wed, 16 Oct 1996 21:29:14 -0700
To: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> From: frantz@netcom.com (Bill Frantz)
Subject: Re: A couple of trivia questions Cc: eros-arch@eros.cis.upenn.edu

Sorry for the delay, I've had problems keeping up with the email.

At 8:51 AM 10/11/96 -0400, Jonathan S. Shapiro wrote:
>In message <199610102021.NAA23625@netcom8.netcom.com>, Bill Frantz writes:
>> What's the problem? If A has a resume key to B, then B must be waiting.
>> If some domain key holder to B changes that situation, then the resume key
>> must be deleted. Then the timer service (in the kernel) holds a zero data
>> key and it "returns" to that zero data key with well defined results.
>
>The problem is that the kernel cannot hold keys in this way.

No, but the kernel has enough knowledge to simulate that behavior. If you look at the KeyKOS External Specs under I/O Key, you will see how KeyKOS solved the problem.

>
>> >2. If A does a gate jump to B, and B is marked as a foreign domain
>> > (e.g. a UNIX domain), my current plan is to allow the gate jump to
>> > proceed, but suppress the transfer of the data, keys, order code,
>> > and key data (data byte). In short, the jump may only alter the
>> > domain state.
>> >
>> > Anybody see a problem with this?
>> >
>> That would be like the key to B was the resume key generated by meter
>> exhaustion.
>
>I'm not clear if you intend this statement to describe a problem or not.

No problem. Merely an example of an already used facility.

>
>> > Heck, it's not real clear what it means to do this anyway.
>>
>> It is one way to fire off a UNIX etc. domain. I just don't see why you
>> introduce a new domain type for this purpose. Controlling the keys should
>> be sufficent to avoid problems, and we found that it was useful to have
>> CMS, at least programs issue gate jumps.
>
>A "foreign" domain is a new domain type.... CMS, in your example, is not
>running as a foreign domain.

No it wasn't. We didn't see the need for a new type of domain when we built our simulators. I am curious why you have seen that need.


Bill Frantz       | Tired of Dole/Clinton?     | Periwinkle -- Consulting
(408)356-8506     | Vote 3rd party.  I'm       | 16345 Englewood Ave.
frantz@netcom.com | Voting for Harry Browne    | Los Gatos, CA 95032, USA



From shap@halifax.syncomas.com Thu Oct 24 02:23:50 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS3-58.UPENN.EDU [128.91.200.187]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id CAA04463 for <eros-arch@eros.cis.upenn.edu>; Thu, 24 Oct 1996 02:23:48 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id CAA07741; Thu, 24 Oct 1996 02:27:27 -0400 Date: Thu, 24 Oct 1996 02:27:27 -0400
Message-Id: <199610240627.CAA07741@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: Standard domain environment

It's funny. You get a kernel to the point where you can run interesting programs, and you thing "there really IS a light at the end of the tunnel." Then you go to work out the details of the so-called primordial domains and you realize that it's the headlight of an oncoming train.

In the past few days, two questions have come to the fore:

  1. How are the primordial domains to be structured?
  2. What is the canonical "environment" in which all domains operate?

I'm going to deal with these in two (subsequent) messages. The primordial domain stuff is pretty much equivalent to the KeyKOS solutions (thanks very much to Norm). The environment stuff is very different. While I know that all of you are busy, I hope you'll read it and offer feedback.

Thanks in advance!

Jonathan

From shap@halifax.syncomas.com Thu Oct 24 02:24:20 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS3-58.UPENN.EDU [128.91.200.187]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id CAA04475 for <eros-arch@eros.cis.upenn.edu>; Thu, 24 Oct 1996 02:24:17 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id CAA07744; Thu, 24 Oct 1996 02:27:56 -0400 Date: Thu, 24 Oct 1996 02:27:56 -0400
Message-Id: <199610240627.CAA07744@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: Primordial domains

What follows is mostly an attempt to state the design constraints on some critical primordial domains. Please let me know of any errors you may spot.

There are a small number of domains that exist at the root of all others. These domains are heavily intertwingled (to borrow a term from Ted Nelson), and I wanted to ensure that their mutual recursions terminate. EROS will replace the factory with escrow agents because of the patent, but the co-dependencies prove to be the same.

The intertwingled domains are:

Virtual Copy Segment Keepers (VCSK)
Factories
The metafactory (i.e. the factory builder). Key Indexed Directories (KID)

I'll deal with each in turn.

VIRTUAL COPY SEGMENT KEEPER (VCSK)

In EROS, the virtual copy segment keeper receives a data message from the kernel. This data message contains the error code and offset of the fault.

	[ For the benefit of the KeyKOS Volken: domain fault codes
	and segment error codes have been unified into a single name
	space because segment errors are sometimes propagated to
	domain keepers and it simplifies the kernel code to unify
	them. ]

The problem is that receiving the message modifies the VCSK address space. VCSK therefore requires mutable storage, which must somehow be allocated. The problem is that VCSK should not depend on VCSK.

I considered having a common VCSK instance that was the keeper for all VCSK address spaces. This remains the fallback design.

It proves that even on the X86 the VCSK can self-bootstrap. When first invoked, VCSK buys space for it's mutable storage and restarts the caller WITHOUT fixing the problem. It proves that THIS CAN BE DONE WITH A READ-ONLY ADDRESS SPACE on the X86. I assume for the moment that no one will ever build that brain-damaged a machine again, so I intend to proceed with this design for now, and fall back to the meta-vcsk if needed in the future.

After the post-initialization restart, the caller restarts the instruction, which again invokes VCSK. This time the fault is handled (or not, as the case may be).

FACTORIES

At first glance, factory (escrow agent) instances would appear to need to store mutable data, which might logically be allocated by VCSK.

Norm points out that for KeyKOS the factory requires a fixed amount of mutable storage. This can be purchased and installed by the metafactory. This is the solution that I intend to adopt.

The follow-up is to ensure that factories for primordial domains do not require KID instances. This proves to be true for the metafactory and the VCS factory, which is sufficient to support all other factories.

EROS will eventually have a key space in addition to a data address space, which will raise the question of where the mutable storage for keys in this space should be kept. Such factories need to be able to buy more space, and should use a VCS as their address space. The metafactory will have a distinct order code indicating whether the factory product will contain a key space. If this order code is used, the factory (i.e. the one created by the metafactory) will be created with a VCS as it's address space. Primordial domains (KID, VCS, metafactory) do not have key spaces.

METAFACTORIES

There exists a single metafactory, whose mutable state is fixed at system install time. It is what Norm calls an "iron man" program (i.e. one which runs in an unkept segment), and must plan all it's mutable storage ahead of time.

Key Indexed Director (KID)

The question for KID is whether it's address space is a virtual copy segment or not. The answer can be yes, but only if the primordial domains listed above do not require a KID. The resolution is to design the VCS factory, KID factory, and metafactory to have no "holes", and to have the factory code assume that the ABSENCE of a a KID is equivalent to a KID with no members.

Jonathan

From shap@halifax.syncomas.com Thu Oct 24 02:24:33 1996 Return-Path: shap@halifax.syncomas.com
Received: from halifax.syncomas.com (TS3-58.UPENN.EDU [128.91.200.187]) by eros.cis.upenn.edu (8.7.4/8.7.3) with ESMTP id CAA04494 for <eros-arch@eros.cis.upenn.edu>; Thu, 24 Oct 1996 02:24:29 -0400 Received: (from shap@localhost) by halifax.syncomas.com (8.7.6/8.7.3) id CAA07750; Thu, 24 Oct 1996 02:28:09 -0400 Date: Thu, 24 Oct 1996 02:28:09 -0400
Message-Id: <199610240628.CAA07750@halifax.syncomas.com> From: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu> To: eros-arch@eros.cis.upenn.edu
Subject: Canonical domain environment

One of the questions that has arisen in recent discussions is:

What is the canonical "environment" of a domain?

What follows is *highly* preliminary, and I hope that you will comment/react to it.

In posing this question, I was concerned with the following:

	+ The space bank that should be used to construct subsidiary
	  services.
	+ Current (virtual) terminal input key  ***
	+ Current (virtual) terminal output key ***
	+ Current (virtual) terminal control key  ***
	+ The name space (directory)
	+ The (append only) error logging stream
	+ The "session manager"
	+ The current scheduling authority
	+ The scheduling admission control agent

	*** Equivalently, the display server input/output/control
	    keys.

Several of these are listed as things domains do NOT have in the OSR paper. Before the KeyKOS folks scream, I am NOT proposing that ALL domains should have these. I am proposing instead that these are sufficiently important that there should be a convention about where they should be found IF PRESENT. In part, this is a reconsideration in light of the now-pervasiveness of GUI's.

The proposal:

Every domain should, by convention, should be created with a node key in Key Register 15. This node has the following slot convention:

    Slot	Description
    0		A key to a space bank.
    1		Display output key
    2		Display input key
    3		Display control key
    4		Directory key
    5		Session Manager Session Creator Key
    6		Schedule key for running domain
    7		Scheduling Admission Control Agent
    8		Error logging stream key (output only)

ANY of these keys can be undefined, in which case the corresponding slot holds a zero number key. The convention can be abrogated, in which case the domain may not be debuggable. The notion is not that these objects must be passed to subsidiary domains, but that if they ARE passed there should be a well-known convention for how this is to be done. The list may shrink as things get refined.

I'll deal with each in turn.

SPACE BANK KEY

This is currently passed to the created domain by the factory. The proposal simply places it in the environment as well. Placing the space bank in the environment allows debuggers to allocate in-domain working storage from the space bank associated with the domain, and allows the domain to create subsidiaries.

Principle: the fact that a service domain fabricates supporting subsidiaries should not, in principle, be exposed to the invoker.

INPUT/OUTPUT/CONTROL Keys

These allow a domain such as a user interface to pass access to the current terminal to subsidiary domains. There is no requirement that such access be propagated. If propagated, it should *by convention* be provided in these slots.

Many domains will not need access to a terminal. For such domains, these slots should hold DK(0).

DIRECTORY KEY

The directory key, if present, conveys to the domain access to the human-readable object namespace.

Norm objected to having this be part of the environment, and asserted that perhaps 1 in 100 domains in KeyKOS needed access to the kernel. I suggested that this is because KeyKOS was in practice used by running emulation environments that in effect imposed their own conventions for such things. Offhand, about 50% of the "applications" I want to build are applications that deal with name spaces. A lot of these I want to have run native. This is very likely due to my UNIX/VMS/DOS bias.

I do *not* propose that all domains should get a directory key. Rather, I propose that there should be a convention for where such a key should reside *IF PASSED*.

In particular, we might want a debugger to be able to access the namespace accessable to the application.

ERROR LOG

The error logging key (a start key) provides the domain with the ability to provide human-readable error messages. There is an open issue concerning how this should be internationalized. The idea is that the error log is an output-only key obeying (by convention) a well-known protocol.

SESSION MGR

Error logs are sufficient for human-readable diagnostics, but sometimes applications require the ability to communicate with the user in an out-of-band fashion for additional information. For example, a dial-up networking subsystem may, on first invocation, need to obtain information about the ISP from the end user. This is distinct from the error log, which is an output-only channel.

Given that we intend to virtualize the terminal/display connection, the session manager cannot reside in the directory; it is an artifact of the current login session. In particular, I want to be able to log in at Penn, start something, log in from home, and steal the session to my (at-home) display. If (after theft) the application puts up a notification, the popup should go on my display at home.

The problem is that the session is not per-user, it is per-login. When the popup occurs, it should occur on the terminal at which the user sits. We could rewrite this terminal key (by convention) in the per-application directory, but not all applications are virtualized in a way that makes their session relocatable. Also, I can think of several applications (e.g. filters) that should have access to the terminal but not to the directory name space.

There is an open question about whether there might not be other things that are per-login, e.g. the network connection establishment agent. This relates to whether the we are doing distribution. I suspect that this will want to become a directory, and I will probably (being paranoid), implement this slot as a session-specific directory with a single entry "mgr".

SCHEDULE KEY

A domain operates under a (restricted) scheduling authority. It should be able to grant it's slice to other parties. To do otherwise exposes to callers the fact that the domain has helpers, which is an encapsulation violation.

ADMISSION CONTROL AGENT

The requestor of a domain should have the OPTION to provide it access to an admission control agent for child domains. This is mostly used by shells; not all domains require the authority to establish subprocesses with schedules independent of their parent.

Jonathan

From LANDAU_CHARLES@Tandem.COM Thu Oct 24 18:00:54 1996 Return-Path: LANDAU_CHARLES@Tandem.COM
Received: from suntan.tandem.com (suntan.tandem.com [192.216.221.8]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id SAA17584; Thu, 24 Oct 1996 18:00:53 -0400 From: LANDAU_CHARLES@Tandem.COM
Received: from POST.TANDEM.COM by suntan.tandem.com (8.6.12/suntan5.960905)

id PAA03079; Thu, 24 Oct 1996 15:00:56 -0700 Received: by POST.TANDEM.COM (4.13/4.5)

id AA12089; 24 Oct 96 15:00:58 -0700 Date: 24 Oct 96 14:59:00 -0700
Message-Id: <199610241500.AA12089@POST.TANDEM.COM> To: shap@eros.cis.upenn.edu
Cc: eros-arch@eros.cis.upenn.edu
Subject: Re: Canonical domain environment

>The convention can be abrogated, in which case the domain may not be debuggable.
>In particular, we might want a debugger to be able to access the namespace accessable to the application.

Of all domains, the debugger ought to be designed to be the least sensitive to this convention. Given your emphasis that the "environment" is only a convention as to where to put keys, if you should chose to provide them (an emphasis which is required to refute Norm's otherwise valid objection), there ought to be no reason that an unconventional domain should not be debuggable. Taking Norm's view, that object references should be passed as keys, not names, I can't think of any reason the debugger would want the application domain's namespace, except in so far as any information can be useful as a hint as to what is going on.

From frantz@netcom.com Fri Oct 25 01:14:33 1996 Return-Path: frantz@netcom.com
Received: from netcom8.netcom.com (netcom8.netcom.com [192.100.81.117]) by eros.cis.upenn.edu (8.7.4/8.7.3) with SMTP id BAA19479; Fri, 25 Oct 1996 01:14:33 -0400 Received: from [207.94.112.176] (sjx-ca85-16.ix.netcom.com [207.94.112.176]) by netcom8.netcom.com (8.6.13/Netcom)

id WAA20712; Thu, 24 Oct 1996 22:14:37 -0700 Message-Id: <199610250514.WAA20712@netcom8.netcom.com> X-Sender: frantz@netcom8.netcom.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Thu, 24 Oct 1996 22:18:01 -0700
To: "Jonathan S. Shapiro" <shap@eros.cis.upenn.edu>,

eros-arch@eros.cis.upenn.edu
From: frantz@netcom.com (Bill Frantz)
Subject: Re: Canonical domain environment

At 2:28 AM 10/24/96 -0400, Jonathan S. Shapiro wrote: ...
>Every domain should, by convention, should be created with a node key
>in Key Register 15. This node has the following slot convention:

This proposal adds an extra node to every domain create. Perhaps it would be better to pass some of the keys (e.g. :

    0           A key to a space bank.
    6           Schedule key for running domain
    7           Scheduling Admission Control Agent
    8           Error logging stream key (output only)

These are the keys likely to be used by the "light-weight" domains. Then only a small subset of all domains created would need an extra node to hold:

    1           Display output key
    2           Display input key
    3           Display control key
    4           Directory key
    5           Session Manager Session Creator Key


I agree with Norm about global access to the directory. It is this feature, more than any other, that makes ALL current OSs subject to Trojan horse attacks. I would hope you would build a system where the programs a user runs do not run with all that user's authority.

In the KeyKOS implementation, very few domains had access to the user's directory because we had a rule of passing keys, rather than the directory name of keys. As a result, only the command system had routine access to the directory. While it is true we had our compile environment in CMS, some linking was done in native KeyKOS, and all execution. I firmly believe we could have moved our development environment to a native environment, with only a few objects having access to the directory. A simple matter of programming, user conversion, and pleasing the (few) paying customers.


Bill Frantz       | Tired of Dole/Clinton?     | Periwinkle -- Consulting
(408)356-8506     | Vote 3rd party.  I'm       | 16345 Englewood Ave.
frantz@netcom.com | Voting for Harry Browne    | Los Gatos, CA 95032, USA



From norm@netcom.com Sun Nov 10 23:13:05 1996 Return-Path: norm@netcom.com
Received: from netcom8.netcom.com (netcom8.netcom.com [192.100.81.117]) by eros.cis.upenn.edu (8.7.6/8.7.3) with SMTP id XAA32672 for <eros-arch@eros.cis.upenn.edu>; Sun, 10 Nov 1996 23:13:04 -0500 Received: (from norm@localhost) by netcom8.netcom.com (8.6.13/Netcom)

id UAA04138; Sun, 10 Nov 1996 20:14:00 -0800 X-Sender: norm@netcom8.netcom.com
Message-Id: <v03007800aeac574a62e7@DialupEudora> Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii" Date: Sun, 10 Nov 1996 20:11:37 -0800
To: eros-arch@eros.cis.upenn.edu
From: Norman Hardy <norm@netcom.com>
Subject: Factory Description

I wrote a light weight description of the KeyKos factory. I would be please at any feedback, especially technical questions.

http://www.mediacity.com/~norm/CapTheory/Factory.html