Announcing E 0.8.4: The Birthday Release Mark S. Miller (markm@caplet.com)
Thu, 27 May 1999 13:18:55 -0700

Today is my birthday. I am now the ultimate answer to the ultimate question of Life, the Universe, and Everything. Getting back more copies of the E Survey would make this a very happy birthday. ;)


The definition of Equality is a sign of E Quality

Up till this release, the implementation of E's "==" (sameness) was an incoherent placeholder awaiting an implementation of the real semantics. Similarly, E's hashtables (EMap and its subtypes ConstMap, FlexMap, and ReadOnlyMap) were using Java's (equally incoherent) .equals() and .hashCode() while awaiting the real thing. An 0.8.3 example:

? [2] == [2]

# value: false

It was not acceptable for something so basic to remain broken.

The real "==" has now mostly arrived. E's "==" (like Java's, PPSmalltalk's, and Squeak's "==", Python's "is", and Lisp's "EQ"), is a primitive that can't be overridden by user code. It does not work by sending any messages. When it says "x == y", this really means x and y are indistinguishable in their effects on future computation. The Grant Matcher Puzzle http://www.erights.org/elib/capability/grant-matcher/index.html explains why such a primitive is necessary to combine trust from separately partially-trusted sources.

The semantics of the "==" operator in these other languages distinguishes between two kinds of objects: scalars and objects with creation-identity. Objects with creation identity are == when they have the same creation-identity. The act of object allocation/creation/initialization in these languages (eg, Java's "new") endows the new object with a unique creation identity. Scalars, on the other hand, are compared by value.

E's treatment of "==" is very similar, and has all the above formal properties (computational equivalence & trust combination) but E broadens the compare-by-value category from scalars PassByCopy objects, which (like scalars) are immutable and transparent. It's just too weird for 3 == 3 to always be true, but for two points or complex numbers to compare based on creation-identity.

The name "PassByCopy" gives away the bigger motivation. Only by broadening the category of objects that have no creation identity, and are defined only by their immutable values, can we broaden the category of objects that can be cheaply passed-by-copy over the network, in an acceptably transparent fashion. Besides the scalars, experience shows that the passed-by-copy category needs to include both primitive and user defined aggregate data types. As of this 0.8.4 release, the PassByCopy category does include E's scalars and primitive aggregate types (ConstList and ConstMap), but not yet objects defined in the E language.

? [2] == [2]

# value: true

? ["foo" => 3, "bar" => 4] == ["foo" => 3, "bar" => 4]
# value: true

The lookup of keys in E's hashtables (EMap etc..) is now correctly based on "==" as well. If x == y, then something stored under x will be retrieved by y:

? def x := [[2] => "foo"]

# [[2] => foo]

? x[[2]]

# value: foo


Shaving the Barber with Occam's Razor

Of course, the definition of "==" is recursive in that way loved by computer scientists. For example, two ConstLists are == when they are the same size and their corresponding elements are ==.

? x == [[2] => "foo"]

# value: true

Actually, if you examine the above statement, you'll see it's actually the recursion loved by all computer scientists that love that sort of thing; ie, it does not require the recursion to terminate. Given the distributed nature of E, and the delayed fashion in which a promise for a value eventually resolves to a value, it would not be possible for E to prohibit cyclic data. Instead (inspired by Colmerauer's Prolog III), all Kernel-E algorithms are required to work, finitely and correctly, with cyclic data. This includes == and hash table lookup:

? define tight := [1, tight, "x"]

# value: [1, ***CYCLE***, x]

? define loose := [1, [1, loose, "x"], "x"]
# value: [1, ***CYCLE***, x]

? tight == loose

# value: true

? def map := [tight => "foo"]

# value: [[1, ***CYCLE***, x] => foo]

? map[loose]

# value: foo

Internally, tight and loose have very different representations. However, when both cycles are unwound, they represent the same infinite tree. One could say that tight's representation of this tree is more tightly wound than loose's representation. However, this difference is only in the implementation, not in the semantics. The value of tight and loose is only the infinite tree they represent. If these trees are the same, then tight and loose are ==.

Notice that loose prints out according to the tightest winding of the tree it represents, not according to the cycle by which it represents this tree. Only the tightest winding is finite and canonical.

All this cycle handling may seem complicated when you try to think about it, because it is unfamiliar (except PrologIII programmers). This would seem to contradict E's goal of easy & quick familiarity to C-tradition programmers. However, C-tradition programmers would expect cyclic structures to cause stack overflows. Rather than blow up, we just remove restrictions and do the right thing, with no dangerous surprises. It is safe for an E programmer not to think about cycles if they wish.


Stable vs Monotonic

Another property of "==" in these other languages: it is total and stable. By "total", I mean that for any two variables x and y, if "x == y" compiles then it runs and successfully produces a value, rather than throwing an exception. By "stable", I mean that x == y always produces the same answer for the same x and y, no matter when and how often we ask.

Unfortunately, promises make it impossible for us to be both total and stable. Consider:

? define [p1,r1] := PromiseMaker()
# value: [<Deferred ref>, <Unresolved Resolver>]

? define [p2,r2] := PromiseMaker()
# value: [<Deferred ref>, <Unresolved Resolver>]

? r1 resolve(3)
? p1

# value: 3

? p2

# value: <Deferred ref>

? r2 resolve(3)
? p2

# value: 3

? p1 == p2

# value: true

Once both promises are resolved, they happen to be resolved to == values. Since a resolved promise is indistinguishable from the value it resolved to, once resolved these promises are themselves ==. However, what if we had asked before they had gotten resolved? Without being able to predict the future, there's no way to give the same answer we will give once they are resolved.

If == is asked before there is enough information to give a stable answer, rather than give an unstable answer, == throws and exception. Since it does not always give an answer, it is "partial" rather than total. Since the cases for which it does give an answer only grows over time, and since the answers it does give are stable, it is "monotonic". (This resembles read-only-unification, or ask-unification, in the Concurrent Prolog family of languages.)

? define [p3,r3] := PromiseMaker()

# value: [<Deferred ref>, <Unresolved Resolver>]

? def loose := [1, [1, loose, p3], "x"]
# value: [1, [1, [1, ***CYCLE***, x], <Deferred ref>], x]

? tight == loose

# problem: <NotSettledException: both must be LOCAL>

? loose == loose[1]

# problem: <NotSettledException: both must be LOCAL>

? E isSettled(loose)

# value: false

? r3 resolve("x")
? loose

# value: [1, ***CYCLE***, x]

? E isSettled(loose)

# value: true

? tight == loose

# value: true

? loose == loose[1]

# value: true

As you can see, we've also introduced a new predicate, "E isSettled(foo)", that says whether a value is settled. An object is settled if it's local and either 1) not PassByCopy, or 2) PassByCopy, and all of its contents are settled. Once again, this recursive definition doesn't assume the tree is finite. Only settled objects can be used as keys in EMaps. I suspect that after talking to Arturo, we'll also want to say that only settled data can be made persistent.


Order-Preserving Tables

Earlier I sent out a first message about the uses of determinism in computation. I haven't yet gotten around to documenting the most important use for E: Contract Verification. An individual E Vat, so it would seem, has only the non-determinism of the arrival order of messages sent by the external world (other Vats and I/O devices), as its non-deterministic inputs. In the absence of spontaneous Errors by the implementation (like java.lang.OutOfMemoryError -- thanks Dan for noticing this issue) it seemed likely that one could deterministically replay a Vat's computation from its initial state and these inputs.

This allows a partially-trusted third party intermediary to run the code of a contract, and provide to external auditing agents a description of the inputs and outputs. The auditing agent could then verify that this contract, given the reported inputs, produce the reported outputs. Such auditors are then left with the difficult (but not necessarily impossible!) task of comparing notes with the participants and verifying that the outputs and inputs they report corresponds to the inputs and outputs reported by the intermediary.

Notice that, because of arrival-order non-determinism, the auditor needs more information from the intermediary than the other participants could provide, since the participant cannot predict how their messages are interleaved. This is spontaneous non-determinism generated at the intermediary. Every such spontaneously generated non-deterministic input to computation must be identified and logged to enable deterministic replay.

A long time ago, I had noticed the threat to determinism posed by Object.hashCode() or any equivalent, and these are not accessible in E. As a result, E's hashtables are necessarily primitive -- one cannot implement a new hashtable in E. This seems a small price. However, not until after the 0.8.3 release did I notice that hashtable enumeration order is just as bad. Besides preventing replay, it can also serve as a covert channel *in*, by which unconfined Mallet can send bits to the confined Bob. This was the direction in which E's constrained non-determinism was supposed to prevent covert channels. (The other direction is *hard*.)

There's no good answer to this problem, but the least bad answer seems to be to make a hashtable's enumeration order be a deterministic consequence of the sequence of operations that caused the hashtable to contain what it contains.
http://www.erights.org/javadoc/org/erights/e/elib/tables/EMap.html and subclasses has the full spec, but the most important properties are

  1. The hashtable resulting from a hashtable expression has the same order as stated in the expression.
  2. In the absence of other operations, insertion order is preserved.

With these two properties, hashtable order becomes useful for other things, and not just something we have to preserve for the sake of determinism. Unfortunately, it means that two table with different orders, but otherwise the same, are not ==.

? define a := ["foo" => 3, "bar" => 4]

# value: [foo => 3, bar => 4]

? define b := ["bar" => 4, "foo" => 3]

# value: [bar => 4, foo => 3]

? a == b

# value: false

Btw, notice that tables print in order.


Proxy Comm System stuff

Other Random Stuff

? 2 +

	> 3
	# value: 5


quasiParsers["e"] valueMaker("2+3") substitute([])

but now expands to

e__quasiParser valueMaker("2+3") substitute([])

This change makes it much easier to define a new uri-getter or quasi-parser in a local scope.

	Is a semantics of == among capabilities a theory of equal rights?
	--MarkM