[E-Lang] remote comms: Timeouts and Connection Failure

Tyler Close tclose@oilspace.com
Tue, 03 Apr 2001 10:01:47 +0100


At 09:37 PM 4/2/01 -0700, Dean Tribble wrote:

>j -2) Connectedness
>
>This issue is about whether there is value in the notion of a connection 
>between an object's client and that object (in which said connection can 
>break).
>
>Zooko said:
>"My (perhaps controversial) assertion is that there is no real useful 
>notion of a "connection" on an untrusted network other than whether or not 
>you have given up on waiting."
>
>This is an easy bug to slide into.  The question is whether you had a 
>reason to wait in the first place.  Or, stated even more confusingly, a 
>connection is not the uncertainty of being connected, but rather the 
>absence of certainty that you are not.  Even though I may not be sure of 
>the next message, I can be very sure that I am *not* in the middle of an 
>SSL session if I never went through the steps to set one up.
>
>More clearly, establishing a connection is about building common knowledge 
>that you are connected.  As above, hardware only lets you establish lesser 
>conditions that only approximate common knowledge, but like all 
>cooperation, they are sufficiently robust illusions billions of people can 
>succeed with them.
>
>The illusion of connection is so valuable, and the failure of that 
>illusion so significant, that any system that does not let the programmer 
>distinguish between the two is fundamentally impairing their ability to 
>program in a distributed system.  All that disbelieving the illusion of 
>connection does is impose all the uncertainty of communication on all 
>requests, to no advantage.  It is much like requiring that all circuits be 
>designed as analog, and discarding then digital logic toolkit.

I find this section very confusing. It is not even clear to me what side 
you are taking on ERiaSR, though, based on the last paragraph, I am 
guessing that you are against it.

The last paragraph doesn't actually have any facts or arguments in it, just 
some fuzzy statements about value and advantage. I think the ERiaSR issue 
is *very* important (I am for  it). Frankly, I think coding reusable smart 
contracts without it is extremely difficult, if not impossible.

Could you please beef up this section with some more concrete arguments?

>h) Bottom-up timeouts are meta-information about the deployment 
>environment, and should *never* appear in application code.

It is for this reason that I believe that all references should be sturdy 
references. Please try to distinguish connection failure from a bottom-up 
timeout.


>j -3) Disconnect and reference integrity
>
>The remaining question is what happens when the connection goes 
>away.  This is a completely separable issue, and indeed E takes a 
>different stance than Joule.  In Joule, when a connection is broken, the 
>individual remote references do not find out about it.  Instead, the 
>keeper for that connection finds out about it.  Keeper is a KeyKOS concept 
>for an out-of-bad exception handler that might be able to address 
>meta-issues about the computation and let it proceed; e.g., a page manager 
>for pages not in main memory.  The keeper then implements a recovery 
>strategy if there is a coherent one, like find where the object moved to, 
>go to a replica, terminate the entire program, break all the promises, etc.

This keeper logic sounds like application logic. The input to this keeper 
can only be, as you've pointed out, timeout information. Why should this 
logic not be in the application code? Consider:


>i) Top-down timeouts are *unrelated* to comm.

If x does not respond within 20 s, try y instead. This sounds like 
application logic. Consider your own advice:


>g) *All* timeouts are either for bottom-up or top-down 
>reasons.  Everything in the middle is confused.

The keeper sounds confused. It is working with top-down goals on bottom-up 
information and sitting in the middle.


>As I recall the motivating case for breaking all the promises is amnesia.

I am sort of reluctant to get into this, since both E and EROS believe in 
allowing amnesia. I think supporting amnesia is ridiculous. It totally 
undermines security and makes a mess of distributed application logic.

As far as I know, E and EROS only have performance reasons for allowing 
amnesia. This is a poor argument. Droplets does not suffer from amnesia and 
does just fine on performance, despite being implemented on top of Java. 
 From Java, it takes me approximately 20 to 40 ms to sync modified state to 
disk. This time is dwarfed by network latency. A system implemented on top 
of the bare hardware should be able to make the comparison even sillier.

Tyler