[E-Lang] remote comms: Timeouts and Connection Failure
Karp, Alan
alan_karp@hp.com
Fri, 20 Apr 2001 08:45:48 -0700
Zooko wrote:
> (Anyway, it is a rare situation in which your counterparty
> fails to send a message that you are expecting, but keeps sending fresh
> keepalives.)
This situation is rare if you are thinking of "connection" failures, but it
is a common symptom of logic errors in the sending program. Normally
keepalives are sent by the low-level messaging code, while the message
you're expecting is coming from the application. If the application gets
stuck in a loop, or misinterprets the request as something that needs no
reply, you'll get your keepalives but not your message. Maybe I write
particularly bad code, but I see this behavior much more frequently than an
undetected lost connection.
_________________________
Alan Karp
Principal Scientist
Decision Technology Department
Hewlett-Packard Laboratories MS 1U-2
1501 Page Mill Road
Palo Alto, CA 94304
(650) 857-3967, fax (650) 857-6278
https://ecardfile.com/id/Alan_Karp
http://www.hpl.hp.com/personal/Alan_Karp/
> -----Original Message-----
> From: zooko@zooko.com [mailto:zooko@zooko.com]
> Sent: Friday, April 20, 2001 8:24 AM
> To: Marc Stiegler
> Cc: e-lang@eros-os.org
> Subject: Re: [E-Lang] remote comms: Timeouts and Connection Failure
>
>
>
> [I was in the shower and I suddenly started wondering about
> this old message in
> which MarcS had asserted that none of his E apps could be
> hung by a counterparty
> falling silent. Here, then is a follow-up. This is *not* my
> big essay on comms
> and security. This is just a few ideas I had just now. --Z]
>
>
> I, Zooko, wrote the things quoted with "> > ":
>
> MarcS wrote:
> >
> > > I think that if a programmer is given a connection
> abstraction, then the common
> > > case is definitely to give up as soon as the comms
> implementation says that the
> > > counterparty is non-responsive (by "breaking" the
> "connection"). *But*,
> > > I think that this is usually a mistake, because usually
> higher-level
> > > consideration (most often, the user) should determine the
> impatience policy
> > > top-down. As a thought experiment, consider how many
> apps could be effectively
> > > hung or DoS'ed by a malicious counterparty sending all
> the right keepalives but
> > > otherwise not doing anything.
> >
> > Well, I do the thought experiment on the following E apps I
> have written.
> > When I say, "nope, can't hang it", it means no part of the system is
> > affected except the specific transactions that would be
> "hung" ("broken")
> > if the hostile service just shut down and didn't play
> keepalive games:
> >
> > --marketplace: nope, can't hang it
> > --Edesk: nope, can't hang it
> > --E web server: nope, can't hang it
> > --Echat: nope, can't hang it
> > --Distributed eBrowser, where outline synchronization is on
> a remote hostile
> > computer: a hostile server can hang the outline
> synchronization process,
> > i.e., prevent infrastructural early-detection that outlines
> will never be
> > synchronized. You still cannot hang the system, however. In
> this case, the
> > user implements his own impatience policy by switching to
> another outline
> > synchronization service (this is really a crazy example, outline
> > synchronization should generally be done on a machine you
> trust as much as
> > the source-editing machine, typically the same machine :-)
> > --Satan at the racetrack: nope, can't hang it (though Satan
> at the racetrack
> > does implement a top-down impatience policy in addition to the
> > infrastructure impatience policy...but as demonstrated
> earlier, this is easy
> > in E, liverefs or no).
> > --The 5-party salesman smart contract: if the trusted
> contract server is
> > hostile, it could use this strategy to hang the world. But
> a hostile trusted
> > contract server can destroy the world in many many
> different ways, some far
> > more malicious and undetectable than this. For the other
> parties to the
> > transaction, this kind of attack would have no effect that
> they could not
> > achieve above-board simply by rejecting their part of the
> contracting
> > relationship. The most interesting desired attack would be
> for the vendor to
> > DoS the salesman's attempt to receive his commission. Nope,
> can't stop it,
> > with this or any other technique.
> >
> > Generally, the worst a hostile process can do using this
> strategy is force
> > you to keep context for the hostile piece of the
> distributed app hanging
> > around .
>
>
> I was just reflecting on this letter and how surprised I was
> at the answers you
> gave.
>
> If this is really true, it implies to me that for *each*
> counterparty in *every*
> app, the requirement for responsiveness is either "nothing"
> or "a steady stream
> of unforgeable keepalives", or "a steady stream of
> unforgeable keepalives plus
> X", where X is another impatience policy that is also
> implemented (in `outline
> synchronization' and `Satan at the racetrack').
>
> From your descriptions, quoted above, it seems like in most
> cases there *is* no
> top-down requirement for a response -- either no response is
> expected at all or
> else the top-down requirement is to wait forever until it arrives.
>
> From these descriptions, it sounds like outline
> synchronization has a top-down
> impatience policy initiated (and implemented) by the user and
> that "Satan at the
> racetrack" has a top-down impatience policy implemented by
> the programmer.
>
>
> Hm.
>
>
> So I guess my next question is: if you don't need a response
> or you are willing
> to wait forever for the response, then why do you need the
> steady stream of
> unforgeable keepalives?
>
> I suppose it is a heuristic -- if you *are* waiting for a
> response, you are
> continually predicting whether you think the response will
> arrive or not, and
> once you predict that it will not arrive (due to insufficient
> unforgeable
> keepalives), then you give up and implement some kind of
> recovery procedure.
>
>
> But I am still suspicious that this behaviour: "Use
> keepalives to try to predict
> whether the response will ever arrive and then abandon your
> `handle response'
> procedure and initiate your `handle non-response' procedure
> if you predict that
> it won't." is not really a requirement of the user or a
> requirement of the
> design, but is instead just being thrown in because that is
> what the connection
> abstraction offers and that is what programmers are used to.
>
>
> The most obvious objection to this impatience behaviour is
> that you might predict
> wrongly -- if there is a transient communication failure, or
> if the remote Vat is
> temporarily turned off, then you will subsequently refuse to
> accept the response
> that you had previously been waiting for, even when it
> arrives. (This was a
> major source of performance problems in Mojo Nation for a while: the
> "accidentally assumed connectiony semantics" bug to which I
> earlier alluded, in
> which we would throw messages out if they arrived too late,
> even though we
> actually desperately wanted the data contained within them.)
>
> You can also predict wrongly the *other* direction, thinking
> that the steady
> stream of unforgeable keepalives signals the imminent arrival
> of your answer, when
> in fact the answer never comes. But since they are
> unforgeable keepalives and
> there is a cryptographic ordering guarantee implemented which
> covers both the
> keepalives and the messages, this can only happen if your
> counterparty never
> *sends* the message.
>
> This latter misprediction is the thought experiment that
> MarcS did, above, but
> I guess mispredicting in *this* direction doesn't have any
> negative consequences
> in MarcS's apps. (Anyway, it is a rare situation in which
> your counterparty
> fails to send a message that you are expecting, but keeps
> sending fresh
> keepalives.)
>
>
> Mispredicting the other way -- spuriously abandoning your
> state and losing the
> ability to process the response message, simply because of a
> few seconds of comms
> interruption or a temporarily suspended Vat -- *does* have
> negative consequences
> in MarcS's apps, I am sure, but we have all grown so used to
> those little
> recurring annoyances known as "broken connections" that we
> don't even think of
> them anymore.
>
>
> Okay, this has rambled a bit, but my point is that the
> "steady stream of
> unforgeable keepalives" impatience policy gets used where it
> probably wasn't
> required and causes occasional spurious failures when
> mispredicting one way, but
> I guess MarcS's thought experiment has convinced me that it
> rarely causes
> problems when mispredicting the other way.
>
>
> Regards,
>
> Zooko
>
> _______________________________________________
> e-lang mailing list
> e-lang@mail.eros-os.org
> http://www.eros-os.org/mailman/listinfo/e-lang
>