[E-Lang] remote comms: Timeouts and Connection Failure
Marc Stiegler
marcs@skyhunter.com
Thu, 5 Apr 2001 09:15:54 -0700
> First, I don't think that the notion of a "broken connection" is a good
one,
> since it implies the notion of a non-broken connection, which is
impossible to
> implement, and since it uses resources in order to implement
faster-discovery-
> of-non-responsiveness in a way that may or may not be what was needed by
the
> higher-level code, and said faster discovery can lead to false
(unnecessary)
> "breakages" of "connections", and in addition (something I did not
properly
> emphasize in my previous messages) since it sours the "familiar O-O
message
> send but it works remotely" abstraction which E offers and (something else
very
> important that I've neglected to emphasize) since it makes it impossible
for
> the programmer to send messages which must be delivered at-most-once and
which
> must be delivered before other messages (without implementing his own
> non-connectiony abstraction atop the connectiony one).
Messages that must be delivered at-most-once: what is an example of this?
The only examples I can think of are ones for which, regardless of sturdy
ref or live ref or something else, I would want to implement the message
using the "intention list" technique from dbms systems, in which the message
can be applied repeatedly without corrupting the data or the system (so in
the dbms context, the message, instead of saying "add 1 to record x", would
instead say "set record x to value y", where y happens to be the result of
adding 1 to what was originally the current value of x).
> Oh by the way, I still haven't answered your question about whether I
think a
> broken connection is a good reason to stop waiting. I'll attempt to
answer it
> now, and please do try to re-phrase your questions in non-connectiony
terms
> next time and see if that helps us converge.
Okay, the non-connectiony phrasing is, do you really think the when-catch
construct doesn't need a catch clause? Messages sent eventually in E are,
with very high probability, messages sent to remote systems. Messages sent
to remote systems are orders of magnitude more likely to suffer disaster
than messages sent to objects in the same vat (thousands of times more
likely, or even millions of times more likely, particularly if you subtract
on the in-vat side the cases where, if object A can't reach object B in the
same vat, A is just as dead as B). Inside a vat you really can assume the
message was delivered. Outside a vat you really cannot. So it makes sense
for the default behavior of the infrastructure to offer more options to the
programmer for coping with problems in eventual sends than in immediate
calls.
In the earliest version of E, there was a more complicated structure than
when-catch that, despite its complexity, did not include a catch clause and
required extra coding to detect and service a problem with the message
delivery (was that too connectiony a description?). The when-catch came into
existence because markm noticed that Edesk, eChat, marketplace, and
everything else uniformly implemented the extra code all the time.
> I think that if a programmer is given a connection abstraction, then the
common
> case is definitely to give up as soon as the comms implementation says
that the
> counterparty is non-responsive (by "breaking" the "connection"). *But*,
> I think that this is usually a mistake, because usually higher-level
> consideration (most often, the user) should determine the impatience
policy
> top-down. As a thought experiment, consider how many apps could be
effectively
> hung or DoS'ed by a malicious counterparty sending all the right
keepalives but
> otherwise not doing anything.
Well, I do the thought experiment on the following E apps I have written.
When I say, "nope, can't hang it", it means no part of the system is
affected except the specific transactions that would be "hung" ("broken")
if the hostile service just shut down and didn't play keepalive games:
--marketplace: nope, can't hang it
--Edesk: nope, can't hang it
--E web server: nope, can't hang it
--Echat: nope, can't hang it
--Distributed eBrowser, where outline synchronization is on a remote hostile
computer: a hostile server can hang the outline synchronization process,
i.e., prevent infrastructural early-detection that outlines will never be
synchronized. You still cannot hang the system, however. In this case, the
user implements his own impatience policy by switching to another outline
synchronization service (this is really a crazy example, outline
synchronization should generally be done on a machine you trust as much as
the source-editing machine, typically the same machine :-)
--Satan at the racetrack: nope, can't hang it (though Satan at the racetrack
does implement a top-down impatience policy in addition to the
infrastructure impatience policy...but as demonstrated earlier, this is easy
in E, liverefs or no).
--The 5-party salesman smart contract: if the trusted contract server is
hostile, it could use this strategy to hang the world. But a hostile trusted
contract server can destroy the world in many many different ways, some far
more malicious and undetectable than this. For the other parties to the
transaction, this kind of attack would have no effect that they could not
achieve above-board simply by rejecting their part of the contracting
relationship. The most interesting desired attack would be for the vendor to
DoS the salesman's attempt to receive his commission. Nope, can't stop it,
with this or any other technique.
Generally, the worst a hostile process can do using this strategy is force
you to keep context for the hostile piece of the distributed app hanging
around . This is the very context you would keep around anyway for a
nonhostile counterparty anyway, hardly an effective DoS attack--there are
plenty worse against which we have no defense. As for hanging a system, the
promises architecture protects you from getting "hung" from this kind of
hostility as a side effect, on its way to preventing you from getting "hung"
from deadlock. Perhaps Mojo Nation suffers more from the absence of promise
architecture than from a presence of connections.
In many (though not all) of the apps listed above, the user does indeed set
high-level impatience policy: if a remote service is not responding in a
fashion the user deems acceptable, he closes the window on that service. In
those cases, the infrastructure policy is indeed merely giving him faster
awareness that he's not going to get any answers back from the service,
enabling him to reduce his window clutter sooner rather than later.
> Now normally I would *never* embark on a crusade to persuade programmers
to
> relinquish crufty old abstractions which they love (well... not in the
context
> of E, at least, which already does way too much crusading. I might embark
on
> such a crusade for the Evil Geniuses Transport Engine.), but I think maybe
> since the familiar abstraction of O-O references and O-O message send
doesn't
> come with this idea of breakable connections, perhaps in this case
programmers
> will actually be *more* comfortable with a different, simpler "message
sending"
> abstraction than with the current SturdyRef/LiveRef, which is a union of
> "connections" and "O-O message sending".
"All things should be made as simple as possible...and no simpler" (I can't
remember who said that, someone famous). Message sending in a distributed
context without the ability to detect that messages are probably not being
delivered any more (and that message x in particular is probably toast)
sounds way too simple. Like I said earlier, E started out with that as the
model, and switched because in practice E was never used that way.
--marcs