At 05:09 PM 5/27/99 , Tyler Close wrote:
[-] They happen to be hash tables.
One can of course have other implementations, but the current implementation
is just a normal multi-probe hashtable (descending from work by Danfuzz
and Sidney Markowitz).
>> Is there some semantic trouble we would get into by adding an
>> 'implementsSameMapping' method (feel free to substitute a better name)
>> to the table protocol?
>
>It would be hell on wheels to implement. If you want to do comparisons like
>that
>you should be using a binary tree and not a hashtable.
>
>So maybe E tables aren't tables but binary trees.
Chip's 'implementsSameMapping' would be no harder to implement and no less
efficient than other things EMaps already do, like "&" and "|".
>This means the whole
>non-determinism thing goes away since there's only one possible ordering. It
>also means that:
> ? ["foo" => 3, "bar" => 4]
> # value: ["bar" => 4, "foo" => 3]
>
> ? ["foo" => 3, "bar" => 4] == ["bar" => 4, "foo" => 3]
> # value: true
[-] No deterministic sort possible.
If I understand you, the deterministic order you have in mind is the sort
order. If all keys were only transitively-PassByCopy objects, this would be
fine. However, two non-PassByCopy objects that are otherwise identical are
still different according to their creation identity. Unless we want to
internally do the bookkeeping to keep track of creation order, there's no
way to deterministically sort these. If we did sort on creation order, we'd
break encapsulation by revealing information to the clients of these objects
that was none of their business.
It took me two weeks to stop looking for determinism in the keys, where it can't be found, and start looking for it in the means by which a table comes to hold those keys. Therefore, in E,
? ["foo" => 3, "bar" => 4] # value: ["foo" => 3, "bar" => 4] ? ["bar" => 4, "foo" => 3] # value: ["bar" => 4, "foo" => 3] ? ["foo" => 3, "bar" => 4] == ["bar" => 4, "foo" => 3] # value: false
To see why this is the only possible solution, take the quotes off of "foo" and "bar" and imagine that these two variables hold two identical-except-for-identity non-PassByCopy objects.
>There is precedent for doing this. The C++ STL defines the set<> and map<>
>templates to be binary trees. They did this because of the more absolute
>runtime
>guarantees that can be placed on a red-black tree (slower, but certain).
>
>Union, intersection, difference, etc. operations are all better implemented
>with
>tree based sets.
[+] I would like to see this.
My hash-table-based algorithms for these are simple brute force, and I'd like
to eventually do better.
>The problem here is that E overloads [] to mean both a list and a table.
>You're not allowed to re-order a list from what was entered by the user.
[-] It's just notation overload, not semantic unity. Depending on whether it contains "=>"s, E's [] represent either a list or a map. Up through 0.8.3, before I realized the determinism problem, I was indeed reordering maps but not lists. Determinism doesn't require that I don't reorder, just that I specify a semantically deterministic algorithm for calculating the order. Since I have to do so, I decided it was both pleasant and efficient to specify that the map resulting from a map-expression has the order as written. It's the least surprising.
>If:
[-] Even when it's too late for such changes, it's good to raise them.
> ("markm", 42)
>was a list/array/tuple and maybe also a hashtable, then using [] for a tree
>would be ok. I also think it would be neat if:
> birthday("markm", 42)
>was a typed tuple and:
> e_lang birthday("markm", 42)
>meant "send e_lang the following birthday tuple".
>
>But now I've probably gone too far afield.
> birthday("markm", 42)
"birthday" can be a function for constructing an object representing a
birthday, and this expression would then be a function call.
> e_lang birthday("markm", 42)
You have now stepped into Joule territory. See the appendix of the Joule
Manual on sealers and unsealers. In this direction, a message name is not a
string to be sent to the receiver. Rather, it's a local variable name to be
looked up just as any other. The identity of the first class anonymous
message-selector looked up identifies which message is being sent. For
doing distributed programming, this presents a logistical problem in
distributing these message-selector objects as part of exporting/importing
type information across the network.
This is all a worthwhile direction, for which I wish other distributed language projects luck. Since I didn't need to bite this one off to succeed at my goals, I decided not to.
Cheers, --MarkM