Message passing
Jonathan Shapiro
shap@viper.cis.upenn.edu
Wed, 14 Dec 94 13:56:48 -0500
>Mach also has both by-value and by-reference transfer
>mechanisms...
If you're referring to out-of-line memory regions as "pass-by-reference",
you're right, and it's currently a big kludge.
On the other hand, if you're referring to ports, note that port passing
in Mach is basically the same as key passing in KeyKOS.
(It's currently unnecessarily complicated in certain respects,
but we're working on fixing that.)
Just as an aside, the number of ports passed and the number of
out-of-line memory regions passed may need to be bounded for the same
reason the number of data bytes needs to be.
>From your description, it's pretty clear that I badly misunderstood
Status: O
the Mach message mechanism, and I need to go back and do more reading.
You'll have to be more specific on the exact problems you expect to see.
For one thing, you keep referring to the ability to pass messages
"atomically", but you haven't defined what "atomic" means.
In more general terms, maybe you mean that a message transfer can't
be interrupted by a page fault - you seem to make the assumption that
all message resources must be pinned while they are being transferred.
Indeed I was making that assumption, and I was clearly in error.
KeyKOS is designed so that the message send can be restarted. It's
approach to transmitting a message is to pin all of the relevant pages
and nodes and then do the transfer. If any resource proves not to be
available, it initiates a pagein/nodein and places the process to
sleep waiting for that I/O event, unlocking all pinned resources.
When the I/O completes the process retries the message send (unless
gronked by a debugger) from the beginning. [I suppose I should be
clear that this is how EROS (the 386 KeyKOS rewrite) worked. I think
KeyKOS worked the same way, but there were differences in the
low-level I/O architecture.]
Taking this approach makes checkpointing easier. As you say, a
message send cannot be interrupted by a checkpoint. The restart
approach lets a checkpoint occur in "mid transfer." If message
transfer can be partially completed, then it seems to me that the OS
needs to have some new process states to express the fact that a
process has partially received/sent a message. This is different from
waiting for a message and also different from running. In addition,
the OS must recall how much of the message has been transferred.
I know that one could checkpoint all of this successfully, but it does
seem more difficult.
Actually, I think that having more process states is more pernicious
than that. If a process is in mid-transfer, what exactly is it's
state as seen by a debugger? If a process is in mid-receive should a
debugger be able to freeze it? If so, what state is it then in? What
state does that put the sender in? Should a debugger be able to see
the intermediate stages of the message transfer if it happens to peek
at the right moment?
Some people feel that worrying about debugging is not important. I
spent several years doing debugger development, and I found in the
process that thinking about debugging issues forces me to make process
model issues explicit.
So I suppose that what I mean by "atomic" is that message transfer is
instantaneous with respect to the process model; there are no process
states associated with the act of transfer.
The deadlock conditions arise from pinning the resources; if you pin a
lot of them you can run out of memory.
I'm not arguing merit here; just attempting to elaborate the
consequences of the different choices.
Also, note that, _in general_ (not all the time though), message passing
in Mach is done by first copying the message from the sender into kernel
memory, and then copying the message out from the kernel to the receiver's
memory as a separate operation.
KeyKOS works hard to go the other way, doing a direct cross-space copy
in nearly all cases. Offhand, I can't think of any places in EROS
where we had to place the message in a kernel buffer.
Oops. Yes I can. If two processes are sufficiently perverse they can
contrive to send a message in such a way that the source and
destination physical addresses overlap. In this case the kernel makes
a copy. Chalk that up as an EROS bug - I didn't think of that one.
Of course, KeyKOS's "no kernel memory allocation" design philosophy
at first seems to be incompatible with two-stage message transfer,
because of the temporary message buffer required in the middle.
Yes and no. KeyKOS does do kernel memory allocation in the sense that
you mean here. It views main memory as a cache of page frames, and
assigns uses to these page frames to different uses fairly freely. It
allocates frames for page tables, for example, and could use a similar
approach for temporary message buffers. The KeyKOS kernel does not
currently do accounting on core page frame usage.
However, the real issue here is accountability: the appropriate
user-space process(es) must be "charged" appropriately for any kernel
memory that they use, including the kernel memory consumed by that
buffer.
I agree with your view here, though in thinking about how to implement
it I came to two places where I couldn't figure out who ought to be
charged:
1. The message copy buffer is in some sense owned by both parties; who
should be charged?
My ans. Doesn't matter; the message buffer is too ephemeral to
care very much. If a single page pushes you over the limit you're
in deeper trouble than that. In Mach it can be more than one page,
though, so it needs real thought.
2. COW pages; when I get a COW'd segment, who gets charged for the
frames allocated by writes? It strikes me that the original holder
has already been charged for storage in this case.
The working sets discussion was a rough (and bad) cut at memory charge
sets. I want to get back to that discussion eventually.
Jonathan