Long answer to quick question shapj@us.ibm.com
Thu, 2 Sep 1999 12:38:21 -0400

John: [others in case interested]

You're right. There's no short answer.

In general, going to 32-bit processors using the current code base should be simple. There are minimal byte-order dependencies in the current kernel. The major change would be in the trap handler (easy) and the memory manager (more involved, depends on the CPU). Doing an initial port, if uninterrupted, would take 3-6 months. Porting to machines with hierarchical page tables (e.g. 68k) could use substantially the current memory logic, and would be closer on the 3 month end of the spectrum. A quick and dirty MIPS port accomplished by implementing the hierarchy in software could be done in about that time frame.

32-bit PPC, because of the hash-structured page tables, requires a good bit more thought. Also, a port that takes proper advantage of the MIPS software reload would take more thought. Both of these get into the 6 month span. In the PPC case, we could in the interest of time view the hashed lookup mechanism as a 2nd level TLB cache, back that with software-implemented tree structured translation, and arrive at a quick port in ~4 months. The main issue on these CPUs is that I suspect a good bit of the current code is heavily dependent on the fact that we presently run on a hierarchically translated machine.

Going to 64 bit isn't substantially harder from a memory mapping perspective -- the internal translation logic is already using 96 bit segment offsets. The more complex issue is dealing with in-memory capability logic. A "quick and dirty" solution would simply increase the size of the in-memory Node structure to allow for the larger pointers, or (equally good) use object indices rather than pointers and leave the size alone. Indices is probably the better answer in the short term, and the change is not visible outside the kernel in any case.

There are two "simple" solutions, but I think that time permitting I'ld want to use this as an excuse to investigate using in-memory GC in preference to in-memory key rings. I'ld therefore want to go at this with a bit more deliberation if given the opportunity.

As to the "more registers" problem, there are two effects that have fairly major impact on overall performance. One is the user/supervisor crossing delay (which is where the x86 eats performance), the other is register save. The main challenge to getting a satisfactory implementation on Merced is the latter. That said, note that Merced has a relatively wide memory bus, and can be placed in a mode in which it will work behind the scenes to "clean" the register file by saving the registers. It remains to be determined if this can be leveraged to reduce the context switching overhead. I expect that Jochen Liedtke will have results on this before I do, as I'm not actively attending to Merced at the moment.

Beyond this, Merced raises some serious challenges to the hand-coding of the IPC path, but this presents no fundamental difficulties other than a lot of hard work.

Jonathan S. Shapiro, Ph. D.
IBM T.J. Watson Research Center
Email: shapj@us.ibm.com
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595

"John C. Randolph" <jcr@idiom.com> on 09/01/99 11:44:51 AM

Please respond to jcr@idiom.com

To: Jonathan S Shapiro/Watson/IBM@IBMUS cc:
Subject: Other processors?

Shap,

Quick question: (which may not have a quick answer)

Suppose you had to port EROS to

  1. Merced
  2. PPC
  3. anything else

What kind of pain are you looking at?

Also, how does the size of a processor's register file affect your context-switching speed?

-jcr