design change -- IPC semantics
Tue, 31 Aug 1999 23:24:08 -0400
I'm contemplating some changes to the fine print of the capability invocation
rules. The changes are as follows:
** 1. It is currently the case that the data string transfer is "all or
nothing". If an N byte buffer was provided and K bytes were received, then at
the end of the trap instruction bytes 0..K-1 contain the string sent by the
invoker, and bytes K..N-1 are unchanged.
REVISION: The semantic change is that the application may no longer assume that
bytes K..N-1 are unchanged at the end of the IPC trap.
Issue 1: It is now detectable that partially completed IPCs may have occurred.
Note however, that there is already a trust relationship between all possible
senders and the invokee by virtue of the fact that they held capabilities to the
** 2. It is currently the case that the entire sent string will be checked for
sender readability, regardless of the number of bytes received by the recipient.
That is, if the invoker transmits 8192 bytes and the invokee accepts only 16,
all 8192 bytes must be readable to the invoker.
REVISION: The bytes actually moved will be tested for validity. It is undefined
whether additional bytes will be tested for validity.
Issue 1: This exposes the size of the receive buffer. The sender and the
recipient are already in legitemate communication, thus this covert channel is
not important. If this is a concern, the sender may prevent this detection by
always receiving MSG_LIMIT bytes (currently 64k).
Issue 2: This alters the reproducibility of the sender fault behavior. The
sender fault behavior is already influenced by where page boundaries fall.
** RATIONALE: The current semantics requires that both receive buffer and send
buffer be probed for validity before initiating the string transfer. This
approximately triples the cost of the transfer. Under the revised architecture,
the interdomain transfer path can simply copy the bits optimistically. If the
pages on one side or the other are invalid, this will induce a page fault, and
the page fault handler will do the appropriate thing (either invoke segment
keeper on sender side or truncate the transfer on receiver side).
The catch is that sender A can get halfway through, page fault, and then sender
B can get in while sender A is producing the page. This motivates change #2.
Note that no new process states are introduced -- an interrupted transfer is
restarted from the beginning.
** 3. The read/write data operations on pages are hereby retired. The "clone"
and "zero" operations will remain. The rationale for this is that there is no
evident requirement for these operations -- pages are in practice read/written
by mapping them into segments. By eliminating the operations that return data,
the page key can be implemented as one of the "simple case" capabilities.
**4. A few operations that currently can be managed with data registers retain
string versions for historical reasons. "Create number key" is an example (I
think). Where the operation can be done using only data registers, the string
version is hereby retired. Here again, the motiviation is to minimize the
number of capabilities that perform string buffer manipulations.
**5. The "write numbers" operation on node keys is hereby retired, though this
may not be permanent. In practice, it's main use is populating data registers
during process creation, which is better done using read registers and write
registers on the process capability. The "create number key in slot N"
operations are retained.
**6. Until now, it has been the case that the SEND operation did not cause an
immediate transfer of control to the recipient. The invokee is presently placed
on the run queue but the invoker retains the CPU. The outcome is not part of
the contract of the SEND operation, and there has been some discussion of having
two variants of send. It may become useful to specify this behavior, and in
fact to have it come out the other way. Resolution in this case is not yet
clear, this is just a "heads up".
**7. It is presently the case that the invoker enters the WAITING/AVAILABLE
state (according to invocation type) prior to the execution of the capability
operation. For example, during a CALL on a kernel key, a resume key is actually
synthesized and the process actually does enter the waiting state briefly. This
is done for the sole purpose of correctly supporting the semantics of a process
that performs a RETURN on a START capability to itself (a perverse case, by the
way, that really should never occur). On a multiprocessor, this state change is
REVISION: It is now guaranteed that the correct *final* state of the invoker
will be set before or instantaneously with setting the state of the invokee. In
the case where the invoker and invokee are the same, it is no longer guaranteed
that the temporary process run state change will ever be observable. In
particular, a CALL on a kernel capability need no longer observably alter the
process state to WAITING. and an invocation on a gate capability need not
observably reflect the change in the invoker process state before the invokee is
made runnable [i.e. is placed on the run queue; the delay from "becoming
runnable" to actually getting a CPU is potentially arbitrary, and isn't what I
am talking about here.]
In particular, this allows the case of RETURN to START key to yourself to be
handled as a special case by updating the invoker state early in an out-of-line
path, where the general case of RETURN still does the invoker state update late
in the path.
Unless I've missed something, the following keys still need to manipulate string
buffers after these changes:
misc key (returner, range key)
gate keys (start/resume)
red segment keys (gate key indirection)
red node keys (gate key indirection)
device keys (data buffer xmit/rcv)
Jonathan S. Shapiro, Ph. D.
IBM T.J. Watson Research Center
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595