Announcing E 0.8.4: The Birthday Release
Mark S. Miller
markm@caplet.com
Thu, 27 May 1999 13:18:55 -0700
Today is my birthday. I am now the ultimate answer to the ultimate question
of Life, the Universe, and Everything. Getting back more copies of the
E Survey would make this a very happy birthday. ;)
-------------------------------------------------------------------------
The definition of Equality is a sign of E Quality
Up till this release, the implementation of E's "==" (sameness) was an
incoherent placeholder awaiting an implementation of the real semantics.
Similarly, E's hashtables (EMap and its subtypes ConstMap, FlexMap, and
ReadOnlyMap) were using Java's (equally incoherent) .equals() and
.hashCode() while awaiting the real thing. An 0.8.3 example:
? [2] == [2]
# value: false
It was not acceptable for something so basic to remain broken.
The real "==" has now mostly arrived. E's "==" (like Java's, PPSmalltalk's,
and Squeak's "==", Python's "is", and Lisp's "EQ"), is a primitive that
can't be overridden by user code. It does not work by sending any messages.
When it says "x == y", this really means x and y are indistinguishable in
their effects on future computation. The Grant Matcher Puzzle
http://www.erights.org/elib/capability/grant-matcher/index.html explains why
such a primitive is necessary to combine trust from separately
partially-trusted sources.
The semantics of the "==" operator in these other languages distinguishes
between two kinds of objects: scalars and objects with creation-identity.
Objects with creation identity are == when they have the same
creation-identity. The act of object allocation/creation/initialization in
these languages (eg, Java's "new") endows the new object with a unique
creation identity. Scalars, on the other hand, are compared by value.
E's treatment of "==" is very similar, and has all the above formal
properties (computational equivalence & trust combination) but E broadens
the compare-by-value category from scalars PassByCopy objects, which (like
scalars) are immutable and transparent. It's just too weird for 3 == 3 to
always be true, but for two points or complex numbers to compare based on
creation-identity.
The name "PassByCopy" gives away the bigger motivation. Only by broadening
the category of objects that have no creation identity, and are defined only
by their immutable values, can we broaden the category of objects that
can be cheaply passed-by-copy over the network, in an acceptably transparent
fashion. Besides the scalars, experience shows that the passed-by-copy
category needs to include both primitive and user defined aggregate data
types. As of this 0.8.4 release, the PassByCopy category does include E's
scalars and primitive aggregate types (ConstList and ConstMap), but not yet
objects defined in the E language.
? [2] == [2]
# value: true
? ["foo" => 3, "bar" => 4] == ["foo" => 3, "bar" => 4]
# value: true
The lookup of keys in E's hashtables (EMap etc..) is now correctly based on
"==" as well. If x == y, then something stored under x will be retrieved by y:
? def x := [[2] => "foo"]
# [[2] => foo]
? x[[2]]
# value: foo
-------------------------------------------------------------------------
Shaving the Barber with Occam's Razor
Of course, the definition of "==" is recursive in that way loved by computer
scientists. For example, two ConstLists are == when they are the same size
and their corresponding elements are ==.
? x == [[2] => "foo"]
# value: true
Actually, if you examine the above statement, you'll see it's actually the
recursion loved by all computer scientists that love that sort of thing; ie,
it does not require the recursion to terminate. Given the distributed
nature of E, and the delayed fashion in which a promise for a value
eventually resolves to a value, it would not be possible for E to prohibit
cyclic data. Instead (inspired by Colmerauer's Prolog III), all Kernel-E
algorithms are required to work, finitely and correctly, with cyclic data.
This includes == and hash table lookup:
? define tight := [1, tight, "x"]
# value: [1, ***CYCLE***, x]
? define loose := [1, [1, loose, "x"], "x"]
# value: [1, ***CYCLE***, x]
? tight == loose
# value: true
? def map := [tight => "foo"]
# value: [[1, ***CYCLE***, x] => foo]
? map[loose]
# value: foo
Internally, tight and loose have very different representations. However,
when both cycles are unwound, they represent the same infinite tree. One
could say that tight's representation of this tree is more tightly wound
than loose's representation. However, this difference is only in the
implementation, not in the semantics. The value of tight and loose is only
the infinite tree they represent. If these trees are the same, then tight
and loose are ==.
Notice that loose prints out according to the tightest winding of the tree
it represents, not according to the cycle by which it represents this tree.
Only the tightest winding is finite and canonical.
All this cycle handling may seem complicated when you try to think about it,
because it is unfamiliar (except PrologIII programmers). This would
seem to contradict E's goal of easy & quick familiarity to C-tradition
programmers. However, C-tradition programmers would expect cyclic
structures to cause stack overflows. Rather than blow up, we just remove
restrictions and do the right thing, with no dangerous surprises. It is
safe for an E programmer not to think about cycles if they wish.
-------------------------------------------------------------------------
Stable vs Monotonic
Another property of "==" in these other languages: it is total and stable.
By "total", I mean that for any two variables x and y, if "x == y" compiles
then it runs and successfully produces a value, rather than throwing an
exception. By "stable", I mean that x == y always produces the same answer
for the same x and y, no matter when and how often we ask.
Unfortunately, promises make it impossible for us to be both total and
stable. Consider:
? define [p1,r1] := PromiseMaker()
# value: [<Deferred ref>, <Unresolved Resolver>]
? define [p2,r2] := PromiseMaker()
# value: [<Deferred ref>, <Unresolved Resolver>]
? r1 resolve(3)
? p1
# value: 3
? p2
# value: <Deferred ref>
? r2 resolve(3)
? p2
# value: 3
? p1 == p2
# value: true
Once both promises are resolved, they happen to be resolved to == values.
Since a resolved promise is indistinguishable from the value it resolved to,
once resolved these promises are themselves ==. However, what if we had
asked before they had gotten resolved? Without being able to predict the
future, there's no way to give the same answer we will give once they are
resolved.
If == is asked before there is enough information to give a stable answer,
rather than give an unstable answer, == throws and exception. Since it does
not always give an answer, it is "partial" rather than total. Since the
cases for which it does give an answer only grows over time, and since the
answers it does give are stable, it is "monotonic". (This resembles
read-only-unification, or ask-unification, in the Concurrent Prolog family
of languages.)
? define [p3,r3] := PromiseMaker()
# value: [<Deferred ref>, <Unresolved Resolver>]
? def loose := [1, [1, loose, p3], "x"]
# value: [1, [1, [1, ***CYCLE***, x], <Deferred ref>], x]
? tight == loose
# problem: <NotSettledException: both must be LOCAL>
? loose == loose[1]
# problem: <NotSettledException: both must be LOCAL>
? E isSettled(loose)
# value: false
? r3 resolve("x")
? loose
# value: [1, ***CYCLE***, x]
? E isSettled(loose)
# value: true
? tight == loose
# value: true
? loose == loose[1]
# value: true
As you can see, we've also introduced a new predicate, "E isSettled(foo)",
that says whether a value is settled. An object is settled if it's local
and either 1) not PassByCopy, or 2) PassByCopy, and all of its contents are
settled. Once again, this recursive definition doesn't assume the tree is
finite. Only settled objects can be used as keys in EMaps. I suspect that
after talking to Arturo, we'll also want to say that only settled data can
be made persistent.
-------------------------------------------------------------------------
Order-Preserving Tables
Earlier I sent out a first message about the uses of determinism in
computation. I haven't yet gotten around to documenting the most important
use for E: Contract Verification. An individual E Vat, so it would seem,
has only the non-determinism of the arrival order of messages sent by the
external world (other Vats and I/O devices), as its non-deterministic
inputs. In the absence of spontaneous Errors by the implementation (like
java.lang.OutOfMemoryError -- thanks Dan for noticing this issue) it seemed
likely that one could deterministically replay a Vat's computation from its
initial state and these inputs.
This allows a partially-trusted third party intermediary to run the code of
a contract, and provide to external auditing agents a description of the
inputs and outputs. The auditing agent could then verify that this
contract, given the reported inputs, produce the reported outputs. Such
auditors are then left with the difficult (but not necessarily impossible!)
task of comparing notes with the participants and verifying that the outputs
and inputs they report corresponds to the inputs and outputs reported by
the intermediary.
Notice that, because of arrival-order non-determinism, the auditor needs
more information from the intermediary than the other participants could
provide, since the participant cannot predict how their messages are
interleaved. This is spontaneous non-determinism generated at the
intermediary. Every such spontaneously generated non-deterministic input
to computation must be identified and logged to enable deterministic replay.
A long time ago, I had noticed the threat to determinism posed by
Object.hashCode() or any equivalent, and these are not accessible in E. As
a result, E's hashtables are necessarily primitive -- one cannot implement a
new hashtable in E. This seems a small price. However, not until after the
0.8.3 release did I notice that hashtable enumeration order is just as bad.
Besides preventing replay, it can also serve as a covert channel *in*, by
which unconfined Mallet can send bits to the confined Bob. This was the
direction in which E's constrained non-determinism was supposed to prevent
covert channels. (The other direction is *hard*.)
There's no good answer to this problem, but the least bad answer seems to be
to make a hashtable's enumeration order be a deterministic consequence of
the sequence of operations that caused the hashtable to contain what it
contains.
http://www.erights.org/javadoc/org/erights/e/elib/tables/EMap.html and
subclasses has the full spec, but the most important properties are
1) The hashtable resulting from a hashtable expression has the same
order as stated in the expression.
2) In the absence of other operations, insertion order is preserved.
With these two properties, hashtable order becomes useful for other things,
and not just something we have to preserve for the sake of determinism.
Unfortunately, it means that two table with different orders, but otherwise
the same, are not ==.
? define a := ["foo" => 3, "bar" => 4]
# value: [foo => 3, bar => 4]
? define b := ["bar" => 4, "foo" => 3]
# value: [bar => 4, foo => 3]
? a == b
# value: false
Btw, notice that tables print in order.
-------------------------------------------------------------------------
Proxy Comm System stuff
* replaceObject() in ProxyOutputStream has been substantially simplified.
* RedirectorResolver had a bunch of bugs that were preventing whenBroken()
from working in some circumstances.
* In anticipation of Una, PassByCopy is now a subtype of PassByPresence.
PassByCopy is the distinction used by ==. PassByPresence is the distinction
used by ProxyOutputStream.replaceObject() to decide whether to serialize the
object itself or not. An Unum would be PassByPresence but not PassByCopy.
-------------------------------------------------------------------------
Other Random Stuff
* Because of a bug in how I was walking the Java inheritance graph, E's
Miranda methods were overriding methods declared in interfaces, rather than
the other way around. (Did you know that .getSuperclass() on a Class object
representing an interface returns null, even though Java interfaces inherit
from 'class Object'?)
* Inspired yet again by something Ping said about Python, I painlessly made a
fully upwards compatible change to the grammar: you may now optionally break
a line after (to the right of) any infix operator.
? 2 +
> 3
# value: 5
* I made a non-upwards-compatible change in the expansion that shouldn't yet
effect anyone but me: I changed the expansion of the URI and quasi-literal
syntaxes to use mangling instead of sub-scopes. Whereas 'foo:/x' used to
expand to uriGetters["foo"]["/x"], in now expands to foo__uriGetters["/x"].
Similarly, e`2+3` used to expand to
quasiParsers["e"] valueMaker("2+3") substitute([])
but now expands to
e__quasiParser valueMaker("2+3") substitute([])
This change makes it much easier to define a new uri-getter or quasi-parser
in a local scope.
* For some unfathomable reason, Javasoft defined Swing not to default to the
platform look & feel, but instead provides boilerplate example code for how
to initialize your application to do so. I can't imagine why this might not
be the right default, so the E interpreter now does this silently when it
starts up. As a result, MarcS' echat now runs with the platform look & feel.
* I've started to write a compiler in E to compile E to Java. Along the
way, I made all the E parsenodes participate in the Visitor design pattern,
so that it would be easy to walk E parse trees. It turns out there are 29
fundamental types of node to walk, as documented in
http://www.erights.org/javadoc/org/erights/e/elang/evm/ETreeVisitor.html
29 is large compared to scheme, but it's still pleasingly small and
scheme-like. Any suggestions for reducing some to the others would be most
welcome.
* Next are Updoc in E, an E-to-Java compiler in E, more bug fixes, and
documentation.
-------------------------------------------------------------------------
Is a semantics of == among capabilities a theory of equal rights?
--MarkM