[E-Lang] remote comms: Timeouts and Connection Failure
zooko@zooko.com
zooko@zooko.com
Wed, 04 Apr 2001 15:29:20 -0700
Dean wrote:
>
> This took much longer to write than I expected. Please comment! (and I
> hope it stayed coherent :-)
Dean, it was excellent! Thank you very much!
I have spent many hours struggling to write a similarly comprehensive essay
about comms, but I haven't been able to generate a coherent one yet.
I'm very glad that Dean, Chris Hibbert, Bill Frantz, MarkM, MarcS, etc. are
working on this. I consider a good comms abstraction to be very important for
E -- so important that delaying deployment until we hash it out is a win in my
book.
The following are just a few comments, and I don't have time to really work on
this to make it a good structured essay like yours, so apologies if it is hard
to follow.
> Timeouts are almost always motivated by either bottom-up concerns or
> top-down concerns. The primary bottom-up concern on this list is
> synthesizing a virtual connection from an unreliable underlying delivery
Although your analysis is excellent and useful, I think you have started by
assuming something that needs to be justified (and hopefully rejected): the
need for an abstraction of "connection".
Just as a data-point, Mojo Nation is a large, dynamic, comms-intensive
application that was built through and through with no notion of "connection".
Actually that isn't quite true -- we did sort of accidentally fall into using
the notion of "connection" for one thing, and that mistake has been the cause
of our single biggest performance problem throughout Mojo Nation's history
including the present release.
(Indeed, my fear is that even though the "connection" abstraction is deceptive
and awkward, it is so familiar to theorists like Dean, implementors like Bill,
and most importantly to programmers, that we will be forced to use it. But
I would like to explore the alternatives first...)
> g) *All* timeouts are either for bottom-up or top-down reasons. Everything
> in the middle is confused.
Sounds good!
> h) Bottom-up timeouts are meta-information about the deployment
> environment, and should *never* appear in application code.
Sounds great!
> i) Top-down timeouts are *unrelated* to comm.
Bingo!
> j) The "every ref is a sturdy ref" issue is unrelated to time-outs.
Whoops -- there are definitely important interactions between ERiaSR,
impatience policies, ordering guarantees, at-most-once guarantees, etc. To
wit:
If ERwaSR, then the underlying comms implementation couldn't notify the
application code that a "connection" were "broken", (i.e. that impatience had
been triggered) by smashing the LiveRefs. I consider this to be a *feature* of
ERiaSR, because I don't want the comms implementation signalling *up* like that
-- I want impatience to be implemented solely in application code (although of
course with help from E) and I want the underlying comms implementation to be
incapable of breaking my promises in my application code. (This is why MarcS's
custom impatience policy[1], while very cool, isn't good enough, because now
*either* the custom impatience policy *or* the underlying comms
implementation's impatience policy can trigger my impatience handler.)
But by the same token, if ERwaSR, there would be no way (? AFAICT) for the
application programmer to signal to the underlying comms implementation that a
certain set of outgoing messages could be safely discarded. This is I think a
fatal flaw in ERiaSR, because the abstraction as I previously envisioned it
would fill up with outgoing messages that would never be delivered because the
counterparty was long gone, but couldn't be deleted because it needed to
guarantee ordering should the counterparty ever reappear and a new message get
sent to it.
I tried fixing this by saying that ordering is on a per-relationship basis so
that when the sending *object* gets deleted, the comms system can delete any
outgoing undelivered messages that the sending object sent. But then you can
have the ugly behaviour that a counterparty drops off the Net, you attempt to
send thousands of large messages to it, and then your situation changes and you
decide that you no longer want for him to receive those messages, then he comes
back and you want to send him a *new* message, but the ordering guarantee is
going to soak up your bandwidth and his delivering all of those rotted
messages.
So then I thought programmers could avoid this ugly congestion by having a
little proxy object that sends all of the outgoing messages, and if they decide
that they no longer care about those thousands of rotted messages, they delete
the proxy object which sent those messages, thus signalling the comms code that
it can delete the outgoing messages, and they send their new message on a new
little proxy object.
... which means that I've been led, with much reluctance, back to wanting a
"SturdyRef/LiveRef" abstraction (since those little proxy objects might as well
be built-in to the language...), but with the important difference that
LiveRefs are *never* broken by underlying comms implementation, only by
application-level impatience policy, or maybe even not then. Maybe they are
unbreakable.
Okay, so this means that part of the "connection" abstraction to which I object
is the notion of knowing whether you are "connected or unconnected", and having
the underlying implementation interrupt your higher-level code based on its
ideas about impatience policies (== its idea about whether the "connection" has
"broken"). But I do like the notion of ordering of messages (although I don't
consider it to be absolutely necessary), and I do like the idea of providing a
way for the higher-level code to specify sequences of messages which can be
discarded versus sequences which must be sent.
> - Common knowledge requirements result from the *fundamental* impossibility
> in modern hardware of having a component X connected by a wire to component
> Y know a fact F (e.g., the transaction to transfer money from X to Y is
> committed, or the connection is still up), and know that: Y knows F, and Y
> knows that X knows F, and Y knows that X knows that Y knows F, etc. This
> problem is also called the Byzantine Generals problem if you want more
> information. Time is often used as a finesse for this problem, for
> example, 2-phase commit uses time-outs for distributed transactions. TCP
> and Pluribus use connection keep-alives to determine that the connection is
> still up. Because the use of time windows to establish common knowledge is
> only to make up for the inability to do so directly, it is always a
> bottom-up requirement.
I think that this behaviour should be considered as "guaranteeing faster
discovery of non-responsiveness through the use of unforgeable keep-alives",
instead of as "determining that the connection is still up". Clearly it does
*not* determine that the connection is "still up", only that it *was* "still
up" before the last keep-alive was received, and anyway I would like to avoid
using this abstraction of "connections still being up" entirely until I've been
convinced that we need it.
Now: I think the need for faster discovery of non-responsiveness is a top-down
requirement.
It might be the case, as Dean argues in the next quoted section, that the
implementation of faster discovery of non-responsiveness (specifically, the
frequency and latency of the keep-alives, the tolerance for deviation, etc.)
should be up to the comms implementation since it knows about the
implementation details of the underlying deployment environment, but the very
decision of whether you need *any* faster discovery of non-responsiveness is up
to the application code, and it might also want to bias the times as well.
(For example, certain kinds of transactions, and certain *counterparties*,
might require only one keep-alive per minute, whereas others might require as
many keep-alives as you can squeeze onto the underlying comms substrate.)
(In Mojo Nation, there are no operations which *require* fast discovery of
non-responsiveness, although it would possibly be a performance improvement if
the benefits of initiating replacement operations sooner were greater than the
costs of doing that discovery. We do not do fast discovery of non-
responsiveness, and anyway the speed-up, if any, would be dwarfed by our
biggest performance problem, which is caused by some code that we wrote
thinking that we could discover that the underlying "connection" was "broken"
and then "close" that "connection", thus saving some bandwidth. In fact we
can't do that because our underlying comms, in order to be able to hop over
firewalls, does not rely on TCP's "connection status" features. As an aside,
this also makes our protocol safer against active attacks, but the motivation
was firewall-hopping, not paranoid security considerations.)
(In fact, now that I think about it, the next performance improvement in Mojo
Nation, which will effectively fix the performance losses due to accidentally
relying on "connections", amounts to doing faster discovery of
non-responsiveness from the top-down requirements and knowledge rather than
from the bottom-up knowledge of the comms substrate. In this planned
improvement, you remember how long it took this particular counterparty to
respond to this particular kind of transaction in the past, and if the current
transaction has already taken longer than 97% of previous such transactions,
then launch a replacement operation. The bottom-up solution would not be able
to do that kind of nuanced solution, and it would also suck up network
resources and computational resources with its unforgeable keep-alives.)
(By the way, I think that distributed 2-phase commits are more likely to be
appropriate in a static network with ubiquitous trust than on a dynamic network
with complex trust relationships, and I think that 2-phase commit and
associated "transaction" abstractions are often used where at-most-once
retriable messages and failure-aware higher-level code would do better. Also
I think that TCP is more appropriate for a static network with ubiquitous
trust, and I think that Pluribus made a mistake in following the examples of
those two technologies.)
> The knowledge of what time period to use for finessing common knowledge is
> *necessarily* dependent on the deployment environment. For example, 10
> seconds could be a perfectly reasonable window for connect keep-alives on
> an Ethernet. On the Internet, 10 seconds might be just long-enough to pass
> cursory testing and really hurt you when you deploy to customers (because
> Internet latency is often higher than that). To a satellite it is of
> course a ludicrously short time-out. But on that giant cluster-server with
> shared-memory streams, it is too *long* by orders-of-magnitude, and could
> result in substantially lower system performance and reliability. And yet
> we want to use the same code across all of those environments (and indeed
> most big Web applications are in at least 3 of the 4). Thus, the
> time-window information is meta-information about the comm environment, and
> should be used by the virtual connection abstraction (e.g., TCP) to
> synthesize connections on the data-gram layer. The application code is in
> terms of this abstraction (or rather, the object-layer abstraction above
> it). The deployment of the application is where the application rubber
> meets the time-dependent road (so BTW the Pluribus time-outs need to be
> configurable).
This is interesting. From my perspective, this is saying that when you are
doing faster discovery of non-responsiveness with unforgeable keep-alives, you
should set the timeouts to be as fast as can be done (let's say, 99% of the
time) on the current deployment environment. I'm not sure that I agree in
general, although clearly that it was is desired for *some* applications.
> The illusion of connection is so valuable, and the failure of that illusion
> so significant, that any system that does not let the programmer
> distinguish between the two is fundamentally impairing their ability to
> program in a distributed system. All that disbelieving the illusion of
> connection does is impose all the uncertainty of communication on all
> requests, to no advantage. It is much like requiring that all circuits be
> designed as analog, and discarding then digital logic toolkit.
My experience with Mojo Nation suggests the opposite to me, but I am willing to
reconsider if you have any arguments more persuasive than analogy to the
Industry Standard Successful Abstraction. :-)
Thank you again, Dean, for your stimulating and enlightening essay. I'm very
glad that you took the time to write it.
Regards,
Zooko
[1] "remote comms", MarcS
http://www.eros-os.org/pipermail/e-lang/2001-April/author.html#5009