Announcing E 0.8.4: The Birthday Release
Mark S. Miller
markm@caplet.com
Tue, 01 Jun 1999 04:13:02 -0700
At 07:18 PM 5/27/99 , Dean Tribble wrote:
>I trust (or at least hope) that the ability to engage in this
>abstraction-breaking event is a capability such that it can be virtualized
>given the appropriate sacrifices to the lords of chaos.
[-] Essentially yes.
For peculiar reasons, it would currently be less convenient to do so than
you'd think, but this can be fixed if it becomes an issue (which I don't
expect). In any case, within the E spec (that I'm making progress on, but
isn't written yet), "x == y" expand to "E same(x, y)". "E" is a primitive
capability that can be virtualized.
However, I become ever more skeptical that it matters. To usefully
virtualize it, you have to 1) maintain the contract, and 2) do something
usefully different than the primitive, or why bother. Do you have any
examples what would satisfy both criteria? I can't think of any.
>>All this cycle handling may seem complicated when you try to think about it,
>...
>Note that in general, many of these classes of issues are complicated when
>you actually *think* about them. That is partly so that using them is not
>complicated (or bug-prone).
[+] Yes, but it presents an interesting documentation problem.
It is indeed good to do something "complicated" if it's actually more
general, formally simpler, and intuitively simpler when you don't think
about it. However, if it's less simple when you do think about it, how do
you avoid making it seem less simple when you document it? After all, we
know what kinds of languages say "Trust me, it does what you expect", and
we must not be assimilated.
>>If == is asked before there is enough information to give a stable answer,
>>rather than give an unstable answer, == throws and exception. Since it does
>>not always give an answer, it is "partial" rather than total. Since the
>>cases for which it does give an answer only grows over time, and since the
>>answers it does give are stable, it is "monotonic". (This resembles
>>read-only-unification, or ask-unification, in the Concurrent Prolog family
>>of languages.)
>
>Except that they block rather than abort when the value hasn't been
>established. How much pain is the differences going to cause? I presume
>the semantics are such that you can't force a wait for resolution when ==
>is used?
[+] Correct, we can't.
This is the price of the Doug-Barnes-reconciliation of sequential and
Actors/FCP/Joule-asynchronous programming -- sequential computation must
never block, or you would get a deadlock. Remember the strange
"deadlock"-like data-lock problems we had in Joule? E largely avoids them
by giving the programmer the means to plan for certain data to be LOCAL, in
which case synchronous operations (like E's ==) can be successfully
performed on it. When the planning has gone awry and a synchronous
operation is attempted on non-LOCAL data, an exception is thrown so that 1)
the buggy invoker is informed of his mistake, and 2) sequential execution
continues, rather than blocking the entire Vat.
>>finite. Only settled objects can be used as keys in EMaps. I suspect that
>>after talking to Arturo, we'll also want to say that only settled data can
>>be made persistent.
>
>?! That seems a dire reduction in functionality. Is it?
[+] Indeed it is! On purpose!
Upgrade-for-release (as opposed to upgrade-for-prototyping) is much much
much harder than persistence. Indeed, upgrade is vastly harder than most of
the very hard issues in building a system like E. Therefore, upgrade should
be the driving issue when designing persistence.
As Arturo taught me
http://www.erights.org/to-be-sorted/StateSerialization.html , to succeed at
upgrade, we must do exactly the kind of manual work in our applications that
orthogonal persistence was supposed to save us from: designing the small
number of data-formats that holds the precious information whose meaning
must be preserved across as upgrade. To succeed, this needs to be vastly
smaller than the number of object definitions in a typical program.
Since we have to do this work anyway, orthogonal persistence isn't giving us
a software engineering benefit. Without MMU support, it's also not giving
us a performance benefit. If we can only save by serializing, it's better
all the way around to serialize less.
So why only settled data? Normally, undelivered messages reside only on the
run-Q or in unforwarded promises. Everything we write out is something
we're committing to upgrade. If we can avoid writing out undelivered
messages, then we don't need to figure out what to do if the object they are
destined for no longer responds to that message in a compatible way, or at all.
The unavoidable dual of upgrade is comm co-existence. Without a flag day,
we can't upgrade some components of a distributed system without facing the
equivalent of the above message compatibility problem. However, we only
have to face that problem from the messages that cross between separately
upgraded units (eg, Vats). If we wrote out unsettled data, we'd have to
face it for all messages asynchronously sent within an application as well.
My current intention for E 1.0 is to have it support persistence with
upgrade along the lines of what Arturo did for Original-E, but not to do
anything for comm coexistence beyond (possibly) the logic to detect and
complain of incompatible versions. Only so many impossible problems before
breakfast, thank you.
>>A long time ago, I had noticed the threat to determinism posed by
>>Object.hashCode() or any equivalent, and these are not accessible in E. As
>>a result, E's hashtables are necessarily primitive -- one cannot implement a
>>new hashtable in E. This seems a small price. However, not until after the
>
>Why don't you instead allocate a deterministic hash in each
>identity-object. Then hashtables can be implemented in the language.
>Since many clever algorithms and distributed coordination things might want
>to use hash, that seems perhaps a better tradeoff. note that of course you
>don't have to allocate the hash until it is used.
[-] The hash code generator would have to be an object.
We should be able not only to deterministically replay a Vat as a whole, but
disjoint subgraphs within a Vat, as long as the non-deterministic inputs
entering this subgraph are logged. Replayability for purposes of replay may
only be useful at Vat granularity, but replayability for purposes of inward
bit confinement is useful within a Vat. The kind of deterministic hashCode
assignment you suggest, if practiced at Vat granularity, would enable the
unconfined Mallet of http://www.erights.org/elib/capability/confinement.html
to communicate bits to the Bob, despite Alice's desire that Bob be
inward-bit-confined, given that Mallet and Bob are in the same Vat.
Really, the problem is that ambient hashCode generation implies mutable
static state, and therefore non-capability-based communication.
>>There's no good answer to this problem, but the least bad answer seems to be
>
>How expensive is it to guarantee these semantics?
[+] Cheap.
I haven't measured this, but my implementor's intuition while writing it is
that, once tuned under profiling, it would only be marginally more time
expensive than non-order-preserving hash tables. OTOH, you've got 4
parallel arrays rather than 2, so there is some space expense.
Then there's the cost *in Java* of E's ==, both for comparison, and for hash
table lookup. It's obviously more expensive that Java's ==. Once tuned, it
should *typically* be less expensive that Java's .equals(), since E's ==
*typically* turns into Java's == (after a few tests).
>I'm glad to see integrated support for sorting. That turns out to be very
>important when moving from small examples to real applications.
[+] Glad to hear it.
It's also been very nice on small examples.
>>With these two properties, hashtable order becomes useful for other things,
>>and not just something we have to preserve for the sake of determinism.
>
>It certainly makes regression testing much better! (Though in fact writing
>out sorted results is more reliable.)
[-] Only if you could sort.
As in my earlier response to Tyler, my operation-based determinism succeeds
at being deterministic even when the keys differ only on identity. Since
regression tests typically compare the printout (as in updoc), you could
sort these outputs when you print. But if you aren't canonicalizing the
enumeration order, your computation will tend to diverge, leading to
different results later.
Cheers,
--MarkM