[E-Lang] remote comms: Timeouts and Connection Failure

Karp, Alan alan_karp@hp.com
Fri, 20 Apr 2001 08:45:48 -0700


Zooko wrote:

>  (Anyway, it is a rare situation in which your counterparty
>  fails to send a message that you are expecting, but keeps sending fresh
>  keepalives.)

This situation is rare if you are thinking of "connection" failures, but it
is a common symptom of logic errors in the sending program.  Normally
keepalives are sent by the low-level messaging code, while the message
you're expecting is coming from the application.  If the application gets
stuck in a loop, or misinterprets the request as something that needs no
reply, you'll get your keepalives but not your message.  Maybe I write
particularly bad code, but I see this behavior much more frequently than an
undetected lost connection.

_________________________
Alan Karp
Principal Scientist
Decision Technology Department
Hewlett-Packard Laboratories MS 1U-2
1501 Page Mill Road
Palo Alto, CA 94304
(650) 857-3967, fax (650) 857-6278
https://ecardfile.com/id/Alan_Karp
http://www.hpl.hp.com/personal/Alan_Karp/
 

> -----Original Message-----
> From: zooko@zooko.com [mailto:zooko@zooko.com]
> Sent: Friday, April 20, 2001 8:24 AM
> To: Marc Stiegler
> Cc: e-lang@eros-os.org
> Subject: Re: [E-Lang] remote comms: Timeouts and Connection Failure 
> 
> 
> 
> [I was in the shower and I suddenly started wondering about 
> this old message in
> which MarcS had asserted that none of his E apps could be 
> hung by a counterparty
> falling silent.  Here, then is a follow-up.  This is *not* my 
> big essay on comms
> and security.  This is just a few ideas I had just now.  --Z]
> 
> 
>    I, Zooko, wrote the things quoted with "> > ":
> 
>  MarcS wrote:
> >
> > > I think that if a programmer is given a connection 
> abstraction, then the common
> > > case is definitely to give up as soon as the comms 
> implementation says that the
> > > counterparty is non-responsive (by "breaking" the 
> "connection").  *But*,
> > > I think that this is usually a mistake, because usually 
> higher-level
> > > consideration (most often, the user) should determine the 
> impatience policy
> > > top-down.  As a thought experiment, consider how many 
> apps could be effectively
> > > hung or DoS'ed by a malicious counterparty sending all 
> the right keepalives but
> > > otherwise not doing anything.
> > 
> > Well, I do the thought experiment on the following E apps I 
> have written.
> > When I say, "nope, can't hang it", it means no part of the system is
> > affected except the specific transactions that would  be 
> "hung" ("broken")
> > if the hostile service just shut down and didn't play 
> keepalive games:
> > 
> > --marketplace: nope, can't hang it
> > --Edesk: nope, can't hang it
> > --E web server: nope, can't hang it
> > --Echat: nope, can't hang it
> > --Distributed eBrowser, where outline synchronization is on 
> a remote hostile
> > computer: a hostile server can hang the outline 
> synchronization process,
> > i.e., prevent infrastructural early-detection that outlines 
> will never be
> > synchronized. You still cannot hang the system, however. In 
> this case, the
> > user implements his own impatience policy by switching to 
> another outline
> > synchronization service (this is really a crazy example, outline
> > synchronization should generally be done on a machine you 
> trust as much as
> > the source-editing machine, typically the same machine :-)
> > --Satan at the racetrack: nope, can't hang it (though Satan 
> at the racetrack
> > does implement a top-down impatience policy in addition to the
> > infrastructure impatience policy...but as demonstrated 
> earlier, this is easy
> > in E, liverefs or no).
> > --The 5-party salesman smart contract: if the trusted 
> contract server is
> > hostile, it could use this strategy to hang the world. But 
> a hostile trusted
> > contract server can destroy the world in many many 
> different ways, some far
> > more malicious and undetectable than this. For the other 
> parties to the
> > transaction, this kind of attack would have no effect that 
> they could not
> > achieve above-board simply by rejecting their part of the 
> contracting
> > relationship. The most interesting desired attack would be 
> for the vendor to
> > DoS the salesman's attempt to receive his commission. Nope, 
> can't stop it,
> > with this or any other technique.
> > 
> > Generally, the worst a hostile process can do using this 
> strategy is force
> > you to keep context for the hostile piece of the 
> distributed app hanging
> > around .
> 
> 
> I was just reflecting on this letter and how surprised I was 
> at the answers you
> gave.
> 
> If this is really true, it implies to me that for *each* 
> counterparty in *every*
> app, the requirement for responsiveness is either "nothing" 
> or "a steady stream
> of unforgeable keepalives", or "a steady stream of 
> unforgeable keepalives plus
> X", where X is another impatience policy that is also 
> implemented (in `outline
> synchronization' and `Satan at the racetrack').
> 
> From your descriptions, quoted above, it seems like in most 
> cases there *is* no
> top-down requirement for a response -- either no response is 
> expected at all or
> else the top-down requirement is to wait forever until it arrives.  
> 
> From these descriptions, it sounds like outline 
> synchronization has a top-down
> impatience policy initiated (and implemented) by the user and 
> that "Satan at the
> racetrack" has a top-down impatience policy implemented by 
> the programmer.
> 
> 
> Hm.
> 
> 
> So I guess my next question is: if you don't need a response 
> or you are willing
> to wait forever for the response, then why do you need the 
> steady stream of
> unforgeable keepalives?
> 
> I suppose it is a heuristic -- if you *are* waiting for a 
> response, you are
> continually predicting whether you think the response will 
> arrive or not, and
> once you predict that it will not arrive (due to insufficient 
> unforgeable
> keepalives), then you give up and implement some kind of 
> recovery procedure.
> 
> 
> But I am still suspicious that this behaviour: "Use 
> keepalives to try to predict
> whether the response will ever arrive and then abandon your 
> `handle response'
> procedure and initiate your `handle non-response' procedure 
> if you predict that
> it won't." is not really a requirement of the user or a 
> requirement of the
> design, but is instead just being thrown in because that is 
> what the connection
> abstraction offers and that is what programmers are used to.
> 
> 
> The most obvious objection to this impatience behaviour is 
> that you might predict
> wrongly -- if there is a transient communication failure, or 
> if the remote Vat is
> temporarily turned off, then you will subsequently refuse to 
> accept the response
> that you had previously been waiting for, even when it 
> arrives.  (This was a
> major source of performance problems in Mojo Nation for a while: the
> "accidentally assumed connectiony semantics" bug to which I 
> earlier alluded, in
> which we would throw messages out if they arrived too late, 
> even though we
> actually desperately wanted the data contained within them.)
> 
> You can also predict wrongly the *other* direction, thinking 
> that the steady
> stream of unforgeable keepalives signals the imminent arrival 
> of your answer, when
> in fact the answer never comes.  But since they are 
> unforgeable keepalives and
> there is a cryptographic ordering guarantee implemented which 
> covers both the
> keepalives and the messages, this can only happen if your 
> counterparty never
> *sends* the message.
> 
> This latter misprediction is the thought experiment that 
> MarcS did, above, but 
> I guess mispredicting in *this* direction doesn't have any 
> negative consequences
> in MarcS's apps.  (Anyway, it is a rare situation in which 
> your counterparty
> fails to send a message that you are expecting, but keeps 
> sending fresh
> keepalives.)
> 
> 
> Mispredicting the other way -- spuriously abandoning your 
> state and losing the
> ability to process the response message, simply because of a 
> few seconds of comms
> interruption or a temporarily suspended Vat -- *does* have 
> negative consequences
> in MarcS's apps, I am sure, but we have all grown so used to 
> those little
> recurring annoyances known as "broken connections" that we 
> don't even think of
> them anymore.
> 
> 
> Okay, this has rambled a bit, but my point is that the 
> "steady stream of
> unforgeable keepalives" impatience policy gets used where it 
> probably wasn't
> required and causes occasional spurious failures when 
> mispredicting one way, but
> I guess MarcS's thought experiment has convinced me that it 
> rarely causes
> problems when mispredicting the other way.
> 
> 
> Regards,
> 
> Zooko
> 
> _______________________________________________
> e-lang mailing list
> e-lang@mail.eros-os.org
> http://www.eros-os.org/mailman/listinfo/e-lang
>