Announcing E 0.8.4: The Birthday Release

Mark S. Miller markm@caplet.com
Tue, 01 Jun 1999 02:14:29 -0700


At 05:09 PM 5/27/99 , Tyler Close wrote:
>> Is there some semantic trouble we would get into by adding an
>> 'implementsSameMapping' method (feel free to substitute a better name)
>> to the table protocol?
>
>It would be hell on wheels to implement. If you want to do comparisons like 
>that
>you should be using a binary tree and not a hashtable.
>
>So maybe E tables aren't tables but binary trees.

[-] They happen to be hash tables.

One can of course have other implementations, but the current implementation 
is just a normal multi-probe hashtable (descending from work by Danfuzz
and Sidney Markowitz).  To preserve order I internally added two columns to 
the hash table implementation: an array of hash positions indexed by rank, 
and an array of ranks indexed by hash positions.  (Ping helped clarify some 
issues here.)  The result is only marginally less time-efficient, and 
constant-time operations remain constant-time.

Chip's 'implementsSameMapping' would be no harder to implement and no less 
efficient than other things EMaps already do, like "&" and "|".


>This means the whole
>non-determinism thing goes away since there's only one possible ordering. It
>also means that:
>
>	? ["foo" => 3, "bar" => 4]
>	# value: ["bar" => 4, "foo" => 3]
>
>	? ["foo" => 3, "bar" => 4] == ["bar" => 4, "foo" => 3]
>	# value: true

[-] No deterministic sort possible.
If I understand you, the deterministic order you have in mind is the sort 
order.  If all keys were only transitively-PassByCopy objects, this would be 
fine.  However, two non-PassByCopy objects that are otherwise identical are 
still different according to their creation identity.  Unless we want to 
internally do the bookkeeping to keep track of creation order, there's no 
way to deterministically sort these.  If we did sort on creation order, we'd 
break encapsulation by revealing information to the clients of these objects 
that was none of their business.

It took me two weeks to stop looking for determinism in the keys, where it 
can't be found, and start looking for it in the means by which a table comes 
to hold those keys.  Therefore, in E, 

	? ["foo" => 3, "bar" => 4]
	# value: ["foo" => 3, "bar" => 4]

	? ["bar" => 4, "foo" => 3]
	# value: ["bar" => 4, "foo" => 3]

	? ["foo" => 3, "bar" => 4] == ["bar" => 4, "foo" => 3]
	# value: false

To see why this is the only possible solution, take the quotes off of "foo" 
and "bar" and imagine that these two variables hold two 
identical-except-for-identity non-PassByCopy objects.


>There is precedent for doing this. The C++ STL defines the set<> and map<>
>templates to be binary trees. They did this because of the more absolute 
>runtime
>guarantees that can be placed on a red-black tree (slower, but certain).
>
>Union, intersection, difference, etc. operations are all better implemented 
>with
>tree based sets.

[+] I would like to see this.
My hash-table-based algorithms for these are simple brute force, and I'd like 
to eventually do better.


>The problem here is that E overloads [] to mean both a list and a table. 
>You're not allowed to re-order a list from what was entered by the user.

[-] It's just notation overload, not semantic unity.
Depending on whether it contains "=>"s, E's [] represent either a list or a 
map.  Up through 0.8.3, before I realized the determinism problem, I was 
indeed reordering maps but not lists.  Determinism doesn't require that I 
don't reorder, just that I specify a semantically deterministic algorithm for 
calculating the order.  Since I have to do so, I decided it was both 
pleasant and efficient to specify that the map resulting from a 
map-expression has the order as written.  It's the least surprising.


>If:
>	("markm", 42)
>was a list/array/tuple and maybe also a hashtable, then using [] for a tree
>would be ok. I also think it would be neat if:
>	birthday("markm", 42)
>was a typed tuple and:
>	e_lang birthday("markm", 42)
>meant "send e_lang the following birthday tuple".
>
>But now I've probably gone too far afield.

[-] Even when it's too late for such changes, it's good to raise them.

>	("markm", 42)
I don't understand the difference between an untyped/unlabeled ordered tuple 
and a list.  I also don't understand how a tuple is like a hashtable.

>	birthday("markm", 42)
"birthday" can be a function for constructing an object representing a 
birthday, and this expression would then be a function call.

>	e_lang birthday("markm", 42)
You have now stepped into Joule territory.  See the appendix of the Joule 
Manual on sealers and unsealers.  In this direction, a message name is not a 
string to be sent to the receiver.  Rather, it's a local variable name to be 
looked up just as any other.  The identity of the first class anonymous 
message-selector looked up identifies which message is being sent.  For 
doing distributed programming, this presents a logistical problem in 
distributing these message-selector objects as part of exporting/importing 
type information across the network.

This is all a worthwhile direction, for which I wish other distributed 
language projects luck.  Since I didn't need to bite this one off to succeed 
at my goals, I decided not to.


	Cheers,
	--MarkM