[E-Lang] Reality Is Just A Peripheral (was: remote comms: Timeouts and Connection Failure)

Mark S. Miller markm@caplet.com
Wed, 04 Apr 2001 13:41:55 -0700


At 09:37 PM Monday 4/2/01, Dean Tribble wrote:
><e-lang.mail.eros-os.org>
>X-Rcpt-To: markm@caplet.com
>X-UIDL: 57879ffcc67f05edd7a8e91de36f3fb8
>
>This took much longer to write than I expected.  Please comment!  (and I 
>hope it stayed coherent :-)

Well, replying to this also took longer than expected.  Perhaps there's a 
lesson here for thinking about timeouts?  ;)

Yes Chip, I couldn't resist.


>There are a few threads floating around about impatience, timeouts on 
>promises, connection failure, and so forth.  This message attempts to 
>respond to the whole thread by de/reconstructing the issue.

Good.  I've repeatedly found such attempted summary messages to be quite 
useful.  This one looks great.


>Timeouts are almost always motivated by either bottom-up concerns or 
>top-down concerns.  The primary bottom-up concern on this list is 
>synthesizing a virtual connection from an unreliable underlying delivery 
>mechanism, but similar concerns appear wrt other computational resources; 
>e.g., is my computation going to finish or starve, am I getting enough 
>bandwidth, etc.  For concreteness, I will focus discussion below on 
>comm-bsed bottom-up motivations.  Top-down concerns are generally driven by 
>requirements; e.g., if my auction doesn't respond within a 5 seconds, 
>customers will go away (or sometimes worse, call support).

Often a dichotomy is a first step towards a taxonomy.  Here's an attempted 
generalization that may bring more uniformity to the topic.

The world inside a single computer, peripherals aside, is, for many 
purposes, best thought of as a mathematical world free of needs for 
timeouts.  Although computation has duration and proceeds in time, most of 
our formalisms of computation do not treat this time as quantitative for 
many good reasons.

The need for computation within a single computer to use quantitative 
timeouts comes only from "peripherals" in the external world and their 
(mostly) preexisting notions of time.  In this perspective, humans (reaction 
times, irritation thresholds), human institutions (legal deadlines), and 
incoming missiles, communications networks, and distant computers are all 
"peripherals" with their own time requirements that our programs need to 
deal with.

How does this relate to your above dichotomy?  Humans, human institutions, 
and incoming missiles all correspond to your "requirements".  Networks and 
distant computers correspond to your "infrastructure".  If this 
correspondence works, then how should we generalize your claim that 
requirements-based, top-down timeouts should be only in application code, 
not infrastructure code, and infrastructure-based, bottom-up timeouts should 
only be in infrastructure, not application code?

If we likewise generalize the view of software from a) being divided into 
application vs infrastructure to b) being divided into many different facets 
for dealing simultaneously with many different aspects of the external 
world, then the whole issue can be seen as the conventional Hayekian 
modularity issue of localizing knowledge appropriately.  If the need for a 
given timeout comes from a particular aspect of the external work, what 
module in our software system already specializes in that aspect of the 
external world?

Where "infrastructure" programs embody comparatively more knowledge of 
computational peripherals, "application" programs embody comparatively more 
knowledge of human and institutional peripherals.  Therefore, if we must 
represent knowledge of the time issues of these peripherals as timeouts, it 
seems clear where this knowledge of timeouts should be localized.

Of course, module A can make the actual value of a timeout parameterizable, 
which allows the knowledge of the nature of the needed timeout mechanism to 
be somewhat separated from knowledge of the value of the timeout.  (I say 
"somewhat" because it can only be shifted to a module B that is in a 
position to parameterize module A.)  This allows better answers, but it does 
not allow us to avoid answering the knowledge localization question.  An 
example of this is Dean's proposal that VatTP continue to be the module that 
understands what it means to time out an interVat connection, but allows the 
values of the timeout it uses to come from something lower level that 
understands some relevant time characteristic of the particular medium a 
particular connection is layered on.


>I will now make some strong claims.  There are a few exceptions to them, 
>which I challenge the reader to provide :-)
>
>g) *All* timeouts are either for bottom-up or top-down reasons.  Everything 
>in the middle is confused.

The following true story is either an example or a counter-example, which 
probably means I am confused ;)

EC Habitats was an extraordinary graphically-based social virtual reality 
system which was the hard core proving ground for E (and indeed, AFAIK, for 
the entire Actors / Concurrent Logic / Distributed capabilities paradigm).  
For this example though, we can just consider it to be a multiway 
region-oriented chat system.  By "region-oriented", I mean that one's Avatar 
(graphical representation of one's presence and persona in the virtual 
reality) is always in a region.  While in any region, you get to see who 
else is in that region with you.  ("Who" is, of course, by Avatar, not by true 
name.)

There are two ways to speak in EC Habitats.  One can speak "out loud" or by 
"telepathy".  Telepathy is directly Avatar to Avatar, is not mediated by 
regions, and will not further concern us here.  If you say "The bird flies 
at midnight" out loud, you are supposed to be able to know (conditional on 
trust in the region host) that, ideally, only those Avatars you see in the 
room with you can hear what you say.  Ideally as well, all Avatars in the 
room accurately see exactly the same set of other Avatars as being in 
the room.  Under this idealization, those in the room would have Common 
Knowledge" of what was said in the room.  (More Common Knowledge in future 
email.)

Even though it isn't possible to accurately implement this spec, because of 
the inescapable problems of distributed systems, by even approximating it 
closely we were able to leverage an intuitive universal tacit understanding 
of the human social-physical world into a security user interface that never 
needed to be explained.

If I recall, when someone enters a room, there is a Star Trek like 
transporter animation of them beaming in and a system word balloon 
announcing their arrival: "Mr. Slippery has entered."  (A system word 
balloon is a rounded rectangle without the pointed stem that would point at 
the speaker.)  The time delay on the transporter animation is adequate for 
someone to decide not to hit the enter key on their next utterance.  Until 
the transporter animation finishes, the new entrant will not hear what is 
said in the room.  In one way this is a perfect example of Dean's point -- 
the time from seeing that Mr. Slippery will be in the room to being able to 
cancel one's own hitting of the enter key (because I don't want Mr. Slippery 
to hear *that*) is a human reaction time issue that has nothing to do with 
computation.  However, there is also some computational slop on top of all 
this -- not everyone sees the fade-in finish simultaneously.

In any case, although this is a timing issue, it isn't a timeout issue.  
More interesting is leaving the room.  

When an Avatar explicitly leaves a room, the issue is much like on entry, 
except the timing danger a speaker faces in now only that they might say 
something assuming Mr. Slippery will hear it, and then, as their finger is 
past the point of no return towards hitting the enter key, see Mr. Slippery 
start to fade out.  The result is uncertainty about whether Mr. Slippery 
actually heard it, but this is fail safe.  The timing issues here are also 
only human based + a bit of computational slop.

More interesting is when an Avatar's hosting machine becomes inaccessible 
from the region host.  What the region actually did was to depend on VatTP's 
connection timeout to determine that the Avatar was no longer reachable, and 
then to evict him from the region, and to inform all those in the region 
(with system word balloons and transporter animations) that this Avatar had 
disappeared.  This timeout was substantially longer than human reaction time 
(maybe 15 seconds), but in the meantime many would notice that Mr. Slippery 
was inactive and immobile.  Whether or not this was the right way to handle 
things, it worked out well in practice.  This case may or may not be an 
example of the middle ground Dean claims is confused.

In any case, ignoring entry ambiguity (which we can, since it was well 
enough behaved) the leaving ambiguity destroyed formal Common Knowledge that 
everyone in the room had heard everything said in the room, but we still had 
Common Knowledge that those *still* in the room, whoever they may be, had 
heard everything said so far.  And since only those still in the room are 
able to hear the next utterance, this weaker form of Common Knowledge was 
adequate for conversation.

More later, especially about Common Knowledge....


        Cheers,
        --MarkM