Re: conclusions on EOR Jonathan S. Shapiro (shap@eros-os.org)
Sun, 2 Jul 2000 14:43:50 -0400

Just a quick note to let you know that I'm not ignoring Kragen's comments on the EOR. I very much appreciate them, and I intend to go through them carefully, but I'm too busy packing my house to deal with all of them right at the moment. This is a quick set of responses.

One overarching comment: some of your suggestions concern questions concerning "so how do I use this" or "how does this all fit together". I agree completely that these questions need to be answered, but I don't think that they should be in the reference manual. One of the UNIX documentation traditions that I think was a good one was to separate manual pages from developer's guides. The latter were designed to be more tutorial and/or how-to in nature. The reference manuals, in contrast, were designed for quick checks on how to perform particular operations. My feeling is that both serve an incredibly useful purpose. EROS is currently very short on the programmer's guide end of the spectrum.

[Various bits on the invocation interface description on the introduction]

That description is deliberately trying to stay away from specifying a particular hardware machine. Also, the description in the introduction is probably out of date. The notion is that the introduction should give you a sense of what is passed, and a hardware-specific addendum should tell you exactly how for your architecture.

The "string in register" idea was abandoned in a later version of the IPC logic. The interface is now specified to move exactly 4 data registers verbatim (one of which is, by convention, the order code/return code) and to perform an optional memory-to-memory string move.

If the receive buffer is too short, the sent string gets truncated. If an invalid page appears in the receive area, the sent string gets truncated and an exception is raised. By invalid, I mean that no page is defined or that the page is not writable by the recipient. If the problem is merely the need to bring the offending page in from the object store the kernel will do that transparently.

> - it's going to confuse 386 assembly programmers when you say
> "halfword" and "word", because the 386's word is called a
> "doubleword" or "long" in the 386 assembly jargon..

Yep. Them 386 programmers are definitely way broken!

Seriously, this is a description that is appropriate to a pure 32-bit system, and all other 32-bit processors call a word a word. Confusing the 386 weenies on this point is better than inventing new terminology. I agree that the issue should be noted somewhere.

> Concepts.Constructor:
> - this seems like a good idea; it's very much like invoking your own
> copy of executables instead of connecting to a single long-running
> executable.

Exactly.

> You might want to mention high performance as well; web
> servers and other high-volume transaction processors will probably
> not create a new instance for each client.

Actually, that is exactly what they'll do if they want security, but before you let that worry you, consider the relative performance figures. EROS starts new processes about twice as fast as Linux does. It takes Java about 1.5 million *bytecodes* to initialize the JDK before starting the first instruction of the main procedure. I don't know what the figures are for Perl. Anybody know them offhand?

> Keepers:

> - how do COW and demand-zero end up mapping to real pages on disk?

The keeper installs an appropriate page capability at an appropriate offset in the segment and restarts the procs. The kernel is responsible for faulting the corresponding page in off the disk on demand. In practice, the page is almost always a newly zeroed page so no disk I/O is necessary.

> - presumably faults handled by a keeper aren't *always* transparent to
> the application --- the keeper may want to kill the application. How
> does it communicate to the kernel that it wants the application to
> continue?

The keeper resumes the process, and in doing so specifies the "fault code" that should be installed in the process control block. If this fault code is zero the process will resume. If the fault code is non-zero an exception is synthesize and the process keeper is invoked. Unlike the memory keeper, the process keeper receives a process capability to the process and can tear the process down if desired.

> Concepts.Address Spaces:
> - you can have pages of keys in the leaves of your address space?

In the current implementation the answer is yes, though they will never appear as valid data pages. It's not at all clear that this was a good idea, as it exposes the size of a capability to the application. Charlie Landau and I have a debate in progress about this.

> - "page faults are considered prompt" --- does this mean that waiting
> for a page to be brought in from disk is considered prompt?

Yes. Non-prompt operations are those whose latency the kernel cannot control. Prompt does not imply real-time. Rather, it implies that the application may rely on the prompt operation not going out for a lengthy lunch before responding.

> - it sounds like the priority scheduler is a fixed-priority system like
> those commonly used in real-time systems, not a traditional
> timesharing scheduler.

Yes, though this may have been a mistake.

> If someone wanted to implement a timesharing
> scheduler, would they be able to do so outside of the kernel?

Yes, in collaboration with the primary scheduler, but it's not clear why they would want to, as the current scheduler already makes provision for this kind of usage.

> - why is the priority scheduling quantum (quanta is plural) so huge [10
ms]

Because PC harward couldn't reliably support anything better when that text was first written. These days, you can get down to about 4ms with acceptable overhead. The problem lies in the design of the PC's real-time clock. If you're willing to put in a high-performance clock card that can be used instead, but the specification needs to be usable on ordinary machines.

> - it still looks like there's no formal way to interrupt a process....

If you hold a process capability, you can set the process's fault code to a non-zero value. This guarantees that an exception will be synthesized the next time the process is considered by the scheduler, and that control will at that point be handed to the process keeper. There is no equivalent to a signal mechanism in EROS, and this is intentional.

> - what key is this "Check Alleged Key Type" defined on?

It is defined on all capabilities, but it is a convention rather than something that is enforced. AKT_Process, which you go on to ask about, is merely a constant value used by the process capability as it's alleged key type. The ommission of the constant value in the documentation is intentional, with the goal of encouraging people to use the symbol rather than the actual integer.

> Primary Objects.Process.Get Key:
> - (0 <= N <= 32) --- what happens when N == 32?

The game ends. :-) Seriously, that's a typo.

> - what's a component of the process?

One of the capabilities making up the process. The reason this isn't the same as a slot in the process root node is that different processes require different numbers of overflow nodes, and it is convenient to have a single interface that allows (e.g.) a debugger to get process information without having to know the underlying node layout.

> What is this stuff about the brand slot? [The clue is the
> brand slot comment; the result is RC_RequestError when "brand slot
> requested", which means we are indeed requesting a key from a slot of
> the process root. It's still a mystery why you shouldn't be able to
> request the key from the brand slot.] [See Kernel Objects.ProcessTool.]

The brand is part of the constructor/process creator mechanism. The brand of a process must be kept secret in order for the process to be authenticatable. Therefore, the brand slot is not readable.

> - presumably "specifying RK0 as the return key register" really means
> "KR0", not "RK0", right? RK0 is always the return key register,
> isn't it?

Probably not, though I don't have enough context to be sure what you were reading. KR0 means "key register 0 of the process". RK0 is the register into which key zero of an IPC should be accepted. RK0 may take on values in the range [0..31]. That is, a process may specify the registers into which it wishes to receive IPC arguments.

> - why is the size of the reply message included in the reply message?
> Don't we get that automatically from the key invocation mechanism?

We don't get it automatically. We get it because it is specified in one of the registers. Note that on exit it holds the number of bytes received, which is not necessarily the number of bytes sent. The recipient needs to be able to detect a bad transmit.

> Primary Objects.Process.Swap Memory:
> - why is this useful?

It's how you implement the closest equivalent to UNIX exec. Take a careful look at the "protospace" implementation, which lives in domain/constructor or domain/protospace.

> Kernel Objects.LogAppend:
> - isn't KT+1 "number key"? Isn't KT+2 "unknown request"?

All of the return codes on this page are bugs.

> Kernel Objects.Sleep:
> - why does system restart make Sleep return early?

Because the sleep duration is not saved by the checkpoint logic, and because if you are sleeping you may need to revise your plan if the system has been rebooted. This is probably a design flaw. The real problem is that the returnee need not be the process that called the sleep capability, so the semantics of this is profoundly messy.

> Device Objects:
> - are there no block devices? Or are those only for disks excluded
> from the single-level store?

Disk device keys are handled specially due to the single level store.

> Buffered stream:
> - it looks like the protocol here is inherently less efficient than the
> Unix read()/write()/readv()/writev() stuff. Hmm, well, maybe not.

It is definitely less efficient, but I have yet to see a scatter/gather design that I'm willing to build into the architecture. I do have a bit reserved in the IPC spec for a future scatter/gather design, though.

Most of the standard process documentation is stale, so I'm deferring on all that for now.

shap