Announcing E 0.8.4: The Birthday Release

Mark S. Miller markm@caplet.com
Tue, 01 Jun 1999 04:13:02 -0700


At 07:18 PM 5/27/99 , Dean Tribble wrote:
>I trust (or at least hope) that the ability to engage in this
>abstraction-breaking event is a capability such that it can be virtualized
>given the appropriate sacrifices to the lords of chaos.

[-] Essentially yes.
For peculiar reasons, it would currently be less convenient to do so than 
you'd think, but this can be fixed if it becomes an issue (which I don't 
expect).  In any case, within the E spec (that I'm making progress on, but 
isn't written yet), "x == y" expand to "E same(x, y)".  "E" is a primitive 
capability that can be virtualized.

However, I become ever more skeptical that it matters.  To usefully 
virtualize it, you have to 1) maintain the contract, and 2) do something 
usefully different than the primitive, or why bother.  Do you have any 
examples what would satisfy both criteria?  I can't think of any.


>>All this cycle handling may seem complicated when you try to think about it, 
>...
>Note that in general, many of these classes of issues are complicated when
>you actually *think* about them.  That is partly so that using them is not
>complicated (or bug-prone).

[+] Yes, but it presents an interesting documentation problem.
It is indeed good to do something "complicated" if it's actually more 
general, formally simpler, and intuitively simpler when you don't think 
about it.  However, if it's less simple when you do think about it, how do 
you avoid making it seem less simple when you document it?  After all, we 
know what kinds of languages say "Trust me, it does what you expect", and 
we must not be assimilated.


>>If == is asked before there is enough information to give a stable answer, 
>>rather than give an unstable answer, == throws and exception.  Since it does 
>>not always give an answer, it is "partial" rather than total.  Since the 
>>cases for which it does give an answer only grows over time, and since the 
>>answers it does give are stable, it is "monotonic".  (This resembles 
>>read-only-unification, or ask-unification, in the Concurrent Prolog family 
>>of languages.)
>
>Except that they block rather than abort when the value hasn't been
>established.  How much pain is the differences going to cause?  I presume
>the semantics are such that you can't force a wait for resolution when ==
>is used?

[+] Correct, we can't.
This is the price of the Doug-Barnes-reconciliation of sequential and 
Actors/FCP/Joule-asynchronous programming -- sequential computation must 
never block, or you would get a deadlock.  Remember the strange 
"deadlock"-like data-lock problems we had in Joule?  E largely avoids them 
by giving the programmer the means to plan for certain data to be LOCAL, in 
which case synchronous operations (like E's ==) can be successfully 
performed on it.  When the planning has gone awry and a synchronous 
operation is attempted on non-LOCAL data, an exception is thrown so that 1) 
the buggy invoker is informed of his mistake, and 2) sequential execution 
continues, rather than blocking the entire Vat.


>>finite.  Only settled objects can be used as keys in EMaps.  I suspect that 
>>after talking to Arturo, we'll also want to say that only settled data can 
>>be made persistent.
>
>?!  That seems a dire reduction in functionality.  Is it?

[+] Indeed it is!  On purpose!
Upgrade-for-release (as opposed to upgrade-for-prototyping) is much much 
much harder than persistence.  Indeed, upgrade is vastly harder than most of 
the very hard issues in building a system like E.  Therefore, upgrade should 
be the driving issue when designing persistence.

As Arturo taught me 
http://www.erights.org/to-be-sorted/StateSerialization.html , to succeed at 
upgrade, we must do exactly the kind of manual work in our applications that 
orthogonal persistence was supposed to save us from: designing the small 
number of data-formats that holds the precious information whose meaning 
must be preserved across as upgrade.  To succeed, this needs to be vastly 
smaller than the number of object definitions in a typical program.

Since we have to do this work anyway, orthogonal persistence isn't giving us 
a software engineering benefit.  Without MMU support, it's also not giving 
us a performance benefit.  If we can only save by serializing, it's better 
all the way around to serialize less.

So why only settled data?  Normally, undelivered messages reside only on the 
run-Q or in unforwarded promises.  Everything we write out is something 
we're committing to upgrade.  If we can avoid writing out undelivered 
messages, then we don't need to figure out what to do if the object they are 
destined for no longer responds to that message in a compatible way, or at all.

The unavoidable dual of upgrade is comm co-existence.  Without a flag day, 
we can't upgrade some components of a distributed system without facing the 
equivalent of the above message compatibility problem.  However, we only 
have to face that problem from the messages that cross between separately 
upgraded units (eg, Vats).  If we wrote out unsettled data, we'd have to 
face it for all messages asynchronously sent within an application as well.

My current intention for E 1.0 is to have it support persistence with 
upgrade along the lines of what Arturo did for Original-E, but not to do 
anything for comm coexistence beyond (possibly) the logic to detect and 
complain of incompatible versions.  Only so many impossible problems before 
breakfast, thank you.


>>A long time ago, I had noticed the threat to determinism posed by 
>>Object.hashCode() or any equivalent, and these are not accessible in E.  As 
>>a result, E's hashtables are necessarily primitive -- one cannot implement a 
>>new hashtable in E. This seems a small price.  However, not until after the 
>
>Why don't you instead allocate a deterministic hash in each
>identity-object.  Then hashtables can be implemented in the language.
>Since many clever algorithms and distributed coordination things might want
>to use hash, that seems perhaps a better tradeoff.  note that of course you
>don't have to allocate the hash until it is used.

[-] The hash code generator would have to be an object.
We should be able not only to deterministically replay a Vat as a whole, but 
disjoint subgraphs within a Vat, as long as the non-deterministic inputs 
entering this subgraph are logged.  Replayability for purposes of replay may 
only be useful at Vat granularity, but replayability for purposes of inward 
bit confinement is useful within a Vat.  The kind of deterministic hashCode 
assignment you suggest, if practiced at Vat granularity, would enable the 
unconfined Mallet of http://www.erights.org/elib/capability/confinement.html 
to communicate bits to the Bob, despite Alice's desire that Bob be 
inward-bit-confined, given that Mallet and Bob are in the same Vat.

Really, the problem is that ambient hashCode generation implies mutable 
static state, and therefore non-capability-based communication.


>>There's no good answer to this problem, but the least bad answer seems to be 
>
>How expensive is it to guarantee these semantics?  

[+] Cheap.
I haven't measured this, but my implementor's intuition while writing it is 
that, once tuned under profiling, it would only be marginally more time 
expensive than non-order-preserving hash tables.  OTOH, you've got 4 
parallel arrays rather than 2, so there is some space expense.

Then there's the cost *in Java* of E's ==, both for comparison, and for hash 
table lookup.  It's obviously more expensive that Java's ==.  Once tuned, it 
should *typically* be less expensive that Java's .equals(), since E's == 
*typically* turns into Java's == (after a few tests).


>I'm glad to see integrated support for sorting.  That turns out to be very
>important when moving from small examples to real applications.

[+] Glad to hear it.
It's also been very nice on small examples.


>>With these two properties, hashtable order becomes useful for other things, 
>>and not just something we have to preserve for the sake of determinism.  
>
>It certainly makes regression testing much better!  (Though in fact writing
>out sorted results is more reliable.)

[-] Only if you could sort.
As in my earlier response to Tyler, my operation-based determinism succeeds 
at being deterministic even when the keys differ only on identity.  Since 
regression tests typically compare the printout (as in updoc), you could 
sort these outputs when you print.  But if you aren't canonicalizing the 
enumeration order, your computation will tend to diverge, leading to 
different results later.


	Cheers,
	--MarkM