resume capabilities

shapj@us.ibm.com shapj@us.ibm.com
Sun, 26 Dec 1999 11:05:00 -0500


In thinking about an IPC performance issue in EROS, I have been led to
ponder whether resume capabilities can safely be eliminated from the EROS
design.  Regrettably, I now have a concrete argument for why they are
required. This note is to capture the requirement for the eros-arch
archive.

Semantics: a resume capability is generated as a side effect of performing
a CALL invocation. The idea is to provide return-once (or at least "return
at most once") semantics. Resume capabilities can be copied, but if *any*
copy is invoked, *all* copies are efficiently converted into a capability
conveying no authority (actually into a "number capability" denoting 0, but
that's a nit).

Because of this semantics, a process that has performed a CALL may rely on
the fact that any reply is a reply to the current call. The reply may be
malformed (in the sense that it contains bad data), but it will not have
arisen as a result of somebody invoking a stale resume capability.  As far
as I can tell, there is no marginal value to the caller in being protected
from bad servers that invoke stale resume capabilities -- a counter
embedded in the explicit argument structure would work very nearly as well.

For servers, however, the resume capability provides a guarantee that
appears to be necessary. The existence of a resume capability implies that
the caller is blocked waiting for that resume capability to be invoked.
This implies that an invocation of that resume capability may generate a
caller-side exception, but that the operation will complete without being
blocked for an unbounded amount of time.  That is, the server is not in a
race condition with other servers, some of whom may be hostile.

This is important because we must assume that a server is a multiplexed
resource, and therefore that delaying its ability to respond constitutes a
potential denial of service attack on other clients.  If hostile servers
were allowed to invoke the client using stale resume capabilities, the
legitemate reply might be arbitrarily and unpredictably delayed while the
client is activated long enough to ignore these hostile replies.

The problem can be worked around using timeouts or buffering.  Timeouts in
calls imply a covert channel, but may not imply one if they exist only in
replies.  Timeouts in the reply convert one denial of service into another:
the server is guaranteed not to block indefinitely, but the client is no
longer guaranteed of a reply (at all).  Buffering introduces a large can of
worms that we all agree we would prefer not to consume live and wriggling.

The reciprical situation -- that the client can be starved by competition
for the server -- is not as compelling.  In deciding to call the server,
the client has already decided to trust the server.  If the server is
multiplexed the client has elected to trust the multiplexing decisions.
This is a completely separate matter from trusting other clients of the
same server.


Jonathan S. Shapiro, Ph. D.
Research Staff Member
IBM T.J. Watson Research Center
Email: shapj@us.ibm.com
Phone: +1 914 784 7085  (Tieline: 863)
Fax: +1 914 784 6576