RE: Announcing E 0.8.4: The Birthday Release Mark S. Miller (markm@caplet.com)
Tue, 01 Jun 1999 02:14:29 -0700

At 05:09 PM 5/27/99 , Tyler Close wrote:
>> Is there some semantic trouble we would get into by adding an
>> 'implementsSameMapping' method (feel free to substitute a better name)
>> to the table protocol?
>
>It would be hell on wheels to implement. If you want to do comparisons like
>that
>you should be using a binary tree and not a hashtable.
>
>So maybe E tables aren't tables but binary trees.

[-] They happen to be hash tables.

One can of course have other implementations, but the current implementation is just a normal multi-probe hashtable (descending from work by Danfuzz and Sidney Markowitz). To preserve order I internally added two columns to the hash table implementation: an array of hash positions indexed by rank, and an array of ranks indexed by hash positions. (Ping helped clarify some issues here.) The result is only marginally less time-efficient, and constant-time operations remain constant-time.

Chip's 'implementsSameMapping' would be no harder to implement and no less efficient than other things EMaps already do, like "&" and "|".

>This means the whole
>non-determinism thing goes away since there's only one possible ordering. It
>also means that:
>
> ? ["foo" => 3, "bar" => 4]
> # value: ["bar" => 4, "foo" => 3]
>
> ? ["foo" => 3, "bar" => 4] == ["bar" => 4, "foo" => 3]
> # value: true

[-] No deterministic sort possible.
If I understand you, the deterministic order you have in mind is the sort order. If all keys were only transitively-PassByCopy objects, this would be fine. However, two non-PassByCopy objects that are otherwise identical are still different according to their creation identity. Unless we want to internally do the bookkeeping to keep track of creation order, there's no way to deterministically sort these. If we did sort on creation order, we'd break encapsulation by revealing information to the clients of these objects that was none of their business.

It took me two weeks to stop looking for determinism in the keys, where it can't be found, and start looking for it in the means by which a table comes to hold those keys. Therefore, in E,

	? ["foo" => 3, "bar" => 4]
	# value: ["foo" => 3, "bar" => 4]

	? ["bar" => 4, "foo" => 3]
	# value: ["bar" => 4, "foo" => 3]

	? ["foo" => 3, "bar" => 4] == ["bar" => 4, "foo" => 3]
	# value: false

To see why this is the only possible solution, take the quotes off of "foo" and "bar" and imagine that these two variables hold two identical-except-for-identity non-PassByCopy objects.

>There is precedent for doing this. The C++ STL defines the set<> and map<>
>templates to be binary trees. They did this because of the more absolute
>runtime
>guarantees that can be placed on a red-black tree (slower, but certain).
>
>Union, intersection, difference, etc. operations are all better implemented
>with
>tree based sets.

[+] I would like to see this.
My hash-table-based algorithms for these are simple brute force, and I'd like to eventually do better.

>The problem here is that E overloads [] to mean both a list and a table.
>You're not allowed to re-order a list from what was entered by the user.

[-] It's just notation overload, not semantic unity. Depending on whether it contains "=>"s, E's [] represent either a list or a map. Up through 0.8.3, before I realized the determinism problem, I was indeed reordering maps but not lists. Determinism doesn't require that I don't reorder, just that I specify a semantically deterministic algorithm for calculating the order. Since I have to do so, I decided it was both pleasant and efficient to specify that the map resulting from a map-expression has the order as written. It's the least surprising.

>If:
> ("markm", 42)
>was a list/array/tuple and maybe also a hashtable, then using [] for a tree
>would be ok. I also think it would be neat if:
> birthday("markm", 42)
>was a typed tuple and:
> e_lang birthday("markm", 42)
>meant "send e_lang the following birthday tuple".
>
>But now I've probably gone too far afield.

[-] Even when it's too late for such changes, it's good to raise them.

> ("markm", 42)

I don't understand the difference between an untyped/unlabeled ordered tuple and a list. I also don't understand how a tuple is like a hashtable.

> birthday("markm", 42)

"birthday" can be a function for constructing an object representing a birthday, and this expression would then be a function call.

> e_lang birthday("markm", 42)
You have now stepped into Joule territory. See the appendix of the Joule Manual on sealers and unsealers. In this direction, a message name is not a string to be sent to the receiver. Rather, it's a local variable name to be looked up just as any other. The identity of the first class anonymous message-selector looked up identifies which message is being sent. For doing distributed programming, this presents a logistical problem in distributing these message-selector objects as part of exporting/importing type information across the network.

This is all a worthwhile direction, for which I wish other distributed language projects luck. Since I didn't need to bite this one off to succeed at my goals, I decided not to.

	Cheers,
	--MarkM