Announcing E 0.8.4: The Birthday Release
Mark S. Miller
markm@caplet.com
Tue, 01 Jun 1999 02:14:29 -0700
At 05:09 PM 5/27/99 , Tyler Close wrote:
>> Is there some semantic trouble we would get into by adding an
>> 'implementsSameMapping' method (feel free to substitute a better name)
>> to the table protocol?
>
>It would be hell on wheels to implement. If you want to do comparisons like
>that
>you should be using a binary tree and not a hashtable.
>
>So maybe E tables aren't tables but binary trees.
[-] They happen to be hash tables.
One can of course have other implementations, but the current implementation
is just a normal multi-probe hashtable (descending from work by Danfuzz
and Sidney Markowitz). To preserve order I internally added two columns to
the hash table implementation: an array of hash positions indexed by rank,
and an array of ranks indexed by hash positions. (Ping helped clarify some
issues here.) The result is only marginally less time-efficient, and
constant-time operations remain constant-time.
Chip's 'implementsSameMapping' would be no harder to implement and no less
efficient than other things EMaps already do, like "&" and "|".
>This means the whole
>non-determinism thing goes away since there's only one possible ordering. It
>also means that:
>
> ? ["foo" => 3, "bar" => 4]
> # value: ["bar" => 4, "foo" => 3]
>
> ? ["foo" => 3, "bar" => 4] == ["bar" => 4, "foo" => 3]
> # value: true
[-] No deterministic sort possible.
If I understand you, the deterministic order you have in mind is the sort
order. If all keys were only transitively-PassByCopy objects, this would be
fine. However, two non-PassByCopy objects that are otherwise identical are
still different according to their creation identity. Unless we want to
internally do the bookkeeping to keep track of creation order, there's no
way to deterministically sort these. If we did sort on creation order, we'd
break encapsulation by revealing information to the clients of these objects
that was none of their business.
It took me two weeks to stop looking for determinism in the keys, where it
can't be found, and start looking for it in the means by which a table comes
to hold those keys. Therefore, in E,
? ["foo" => 3, "bar" => 4]
# value: ["foo" => 3, "bar" => 4]
? ["bar" => 4, "foo" => 3]
# value: ["bar" => 4, "foo" => 3]
? ["foo" => 3, "bar" => 4] == ["bar" => 4, "foo" => 3]
# value: false
To see why this is the only possible solution, take the quotes off of "foo"
and "bar" and imagine that these two variables hold two
identical-except-for-identity non-PassByCopy objects.
>There is precedent for doing this. The C++ STL defines the set<> and map<>
>templates to be binary trees. They did this because of the more absolute
>runtime
>guarantees that can be placed on a red-black tree (slower, but certain).
>
>Union, intersection, difference, etc. operations are all better implemented
>with
>tree based sets.
[+] I would like to see this.
My hash-table-based algorithms for these are simple brute force, and I'd like
to eventually do better.
>The problem here is that E overloads [] to mean both a list and a table.
>You're not allowed to re-order a list from what was entered by the user.
[-] It's just notation overload, not semantic unity.
Depending on whether it contains "=>"s, E's [] represent either a list or a
map. Up through 0.8.3, before I realized the determinism problem, I was
indeed reordering maps but not lists. Determinism doesn't require that I
don't reorder, just that I specify a semantically deterministic algorithm for
calculating the order. Since I have to do so, I decided it was both
pleasant and efficient to specify that the map resulting from a
map-expression has the order as written. It's the least surprising.
>If:
> ("markm", 42)
>was a list/array/tuple and maybe also a hashtable, then using [] for a tree
>would be ok. I also think it would be neat if:
> birthday("markm", 42)
>was a typed tuple and:
> e_lang birthday("markm", 42)
>meant "send e_lang the following birthday tuple".
>
>But now I've probably gone too far afield.
[-] Even when it's too late for such changes, it's good to raise them.
> ("markm", 42)
I don't understand the difference between an untyped/unlabeled ordered tuple
and a list. I also don't understand how a tuple is like a hashtable.
> birthday("markm", 42)
"birthday" can be a function for constructing an object representing a
birthday, and this expression would then be a function call.
> e_lang birthday("markm", 42)
You have now stepped into Joule territory. See the appendix of the Joule
Manual on sealers and unsealers. In this direction, a message name is not a
string to be sent to the receiver. Rather, it's a local variable name to be
looked up just as any other. The identity of the first class anonymous
message-selector looked up identifies which message is being sent. For
doing distributed programming, this presents a logistical problem in
distributing these message-selector objects as part of exporting/importing
type information across the network.
This is all a worthwhile direction, for which I wish other distributed
language projects luck. Since I didn't need to bite this one off to succeed
at my goals, I decided not to.
Cheers,
--MarkM