[E-Lang] Reality Is Just A Peripheral (was: remote comms: Timeouts and
Connection Failure)
Mark S. Miller
markm@caplet.com
Wed, 04 Apr 2001 13:41:55 -0700
At 09:37 PM Monday 4/2/01, Dean Tribble wrote:
><e-lang.mail.eros-os.org>
>X-Rcpt-To: markm@caplet.com
>X-UIDL: 57879ffcc67f05edd7a8e91de36f3fb8
>
>This took much longer to write than I expected. Please comment! (and I
>hope it stayed coherent :-)
Well, replying to this also took longer than expected. Perhaps there's a
lesson here for thinking about timeouts? ;)
Yes Chip, I couldn't resist.
>There are a few threads floating around about impatience, timeouts on
>promises, connection failure, and so forth. This message attempts to
>respond to the whole thread by de/reconstructing the issue.
Good. I've repeatedly found such attempted summary messages to be quite
useful. This one looks great.
>Timeouts are almost always motivated by either bottom-up concerns or
>top-down concerns. The primary bottom-up concern on this list is
>synthesizing a virtual connection from an unreliable underlying delivery
>mechanism, but similar concerns appear wrt other computational resources;
>e.g., is my computation going to finish or starve, am I getting enough
>bandwidth, etc. For concreteness, I will focus discussion below on
>comm-bsed bottom-up motivations. Top-down concerns are generally driven by
>requirements; e.g., if my auction doesn't respond within a 5 seconds,
>customers will go away (or sometimes worse, call support).
Often a dichotomy is a first step towards a taxonomy. Here's an attempted
generalization that may bring more uniformity to the topic.
The world inside a single computer, peripherals aside, is, for many
purposes, best thought of as a mathematical world free of needs for
timeouts. Although computation has duration and proceeds in time, most of
our formalisms of computation do not treat this time as quantitative for
many good reasons.
The need for computation within a single computer to use quantitative
timeouts comes only from "peripherals" in the external world and their
(mostly) preexisting notions of time. In this perspective, humans (reaction
times, irritation thresholds), human institutions (legal deadlines), and
incoming missiles, communications networks, and distant computers are all
"peripherals" with their own time requirements that our programs need to
deal with.
How does this relate to your above dichotomy? Humans, human institutions,
and incoming missiles all correspond to your "requirements". Networks and
distant computers correspond to your "infrastructure". If this
correspondence works, then how should we generalize your claim that
requirements-based, top-down timeouts should be only in application code,
not infrastructure code, and infrastructure-based, bottom-up timeouts should
only be in infrastructure, not application code?
If we likewise generalize the view of software from a) being divided into
application vs infrastructure to b) being divided into many different facets
for dealing simultaneously with many different aspects of the external
world, then the whole issue can be seen as the conventional Hayekian
modularity issue of localizing knowledge appropriately. If the need for a
given timeout comes from a particular aspect of the external work, what
module in our software system already specializes in that aspect of the
external world?
Where "infrastructure" programs embody comparatively more knowledge of
computational peripherals, "application" programs embody comparatively more
knowledge of human and institutional peripherals. Therefore, if we must
represent knowledge of the time issues of these peripherals as timeouts, it
seems clear where this knowledge of timeouts should be localized.
Of course, module A can make the actual value of a timeout parameterizable,
which allows the knowledge of the nature of the needed timeout mechanism to
be somewhat separated from knowledge of the value of the timeout. (I say
"somewhat" because it can only be shifted to a module B that is in a
position to parameterize module A.) This allows better answers, but it does
not allow us to avoid answering the knowledge localization question. An
example of this is Dean's proposal that VatTP continue to be the module that
understands what it means to time out an interVat connection, but allows the
values of the timeout it uses to come from something lower level that
understands some relevant time characteristic of the particular medium a
particular connection is layered on.
>I will now make some strong claims. There are a few exceptions to them,
>which I challenge the reader to provide :-)
>
>g) *All* timeouts are either for bottom-up or top-down reasons. Everything
>in the middle is confused.
The following true story is either an example or a counter-example, which
probably means I am confused ;)
EC Habitats was an extraordinary graphically-based social virtual reality
system which was the hard core proving ground for E (and indeed, AFAIK, for
the entire Actors / Concurrent Logic / Distributed capabilities paradigm).
For this example though, we can just consider it to be a multiway
region-oriented chat system. By "region-oriented", I mean that one's Avatar
(graphical representation of one's presence and persona in the virtual
reality) is always in a region. While in any region, you get to see who
else is in that region with you. ("Who" is, of course, by Avatar, not by true
name.)
There are two ways to speak in EC Habitats. One can speak "out loud" or by
"telepathy". Telepathy is directly Avatar to Avatar, is not mediated by
regions, and will not further concern us here. If you say "The bird flies
at midnight" out loud, you are supposed to be able to know (conditional on
trust in the region host) that, ideally, only those Avatars you see in the
room with you can hear what you say. Ideally as well, all Avatars in the
room accurately see exactly the same set of other Avatars as being in
the room. Under this idealization, those in the room would have Common
Knowledge" of what was said in the room. (More Common Knowledge in future
email.)
Even though it isn't possible to accurately implement this spec, because of
the inescapable problems of distributed systems, by even approximating it
closely we were able to leverage an intuitive universal tacit understanding
of the human social-physical world into a security user interface that never
needed to be explained.
If I recall, when someone enters a room, there is a Star Trek like
transporter animation of them beaming in and a system word balloon
announcing their arrival: "Mr. Slippery has entered." (A system word
balloon is a rounded rectangle without the pointed stem that would point at
the speaker.) The time delay on the transporter animation is adequate for
someone to decide not to hit the enter key on their next utterance. Until
the transporter animation finishes, the new entrant will not hear what is
said in the room. In one way this is a perfect example of Dean's point --
the time from seeing that Mr. Slippery will be in the room to being able to
cancel one's own hitting of the enter key (because I don't want Mr. Slippery
to hear *that*) is a human reaction time issue that has nothing to do with
computation. However, there is also some computational slop on top of all
this -- not everyone sees the fade-in finish simultaneously.
In any case, although this is a timing issue, it isn't a timeout issue.
More interesting is leaving the room.
When an Avatar explicitly leaves a room, the issue is much like on entry,
except the timing danger a speaker faces in now only that they might say
something assuming Mr. Slippery will hear it, and then, as their finger is
past the point of no return towards hitting the enter key, see Mr. Slippery
start to fade out. The result is uncertainty about whether Mr. Slippery
actually heard it, but this is fail safe. The timing issues here are also
only human based + a bit of computational slop.
More interesting is when an Avatar's hosting machine becomes inaccessible
from the region host. What the region actually did was to depend on VatTP's
connection timeout to determine that the Avatar was no longer reachable, and
then to evict him from the region, and to inform all those in the region
(with system word balloons and transporter animations) that this Avatar had
disappeared. This timeout was substantially longer than human reaction time
(maybe 15 seconds), but in the meantime many would notice that Mr. Slippery
was inactive and immobile. Whether or not this was the right way to handle
things, it worked out well in practice. This case may or may not be an
example of the middle ground Dean claims is confused.
In any case, ignoring entry ambiguity (which we can, since it was well
enough behaved) the leaving ambiguity destroyed formal Common Knowledge that
everyone in the room had heard everything said in the room, but we still had
Common Knowledge that those *still* in the room, whoever they may be, had
heard everything said so far. And since only those still in the room are
able to hear the next utterance, this weaker form of Common Knowledge was
adequate for conversation.
More later, especially about Common Knowledge....
Cheers,
--MarkM