Not the Territory (was: Announcing E 0.8.4: The Birthday Release)

Mark S. Miller markm@caplet.com
Thu, 03 Jun 1999 01:52:04 -0700


At 04:47 PM 6/1/99 , Dan Bornstein wrote:
>... I am the Danfuzz that MarkM mentioned, ...
>... I'll merely "agree to disagree" about most of your points. ...

Well, since I am the MarkM that Danfuzz mentions above, I'll seek agreement 
by disagreeing ;)


At 07:48 AM 6/1/99 , Tyler Close wrote:
>... In my own experimentation with hashtables for the
>Waterken libraries, I found that internal hashing sucks compared to external
>hashing. ...

[#] Let the profiler decide.
When thinking about complexity measures, or when trying to figure out why 
something doesn't perform as expected, it's worth being analytical about 
performance.  When it comes to choosing which of two techniques with the 
same complexity measure has better performance, given that we can change our 
minds later invisibly to the language spec, we should wait until we measure.  


>The code for these algorithms is in the C++ STL. I'd recommend looking at the
>SGI implementation, it's much more readable than others. You could also 
>look at the implementation in my library.

[+] I'll look at yours.
1) You can't imagine how much I don't want to start reading C++ again.
2) I'd guess that your code is at least as good a representative as the 
others of the points you wish to make ;)



>> [-] No deterministic sort possible.
>> If I understand you, the deterministic order you have in mind is the sort
>> order.  If all keys were only transitively-PassByCopy objects, this would be
>> fine.
>
>Given that both hashtables and binary trees break if you modify the key of an
>inserted element and that E already must support transitively immutable 
>objects (for exceptions), give me a good reason for allowing mutable keys. 
>Better than that, tell me why mutable keys are even something you'd want. 
>E's answer should be: "You don't want them AND you can't have them." Mutable 
>keys are nothing but a source of bugs AND, it seems, security holes.

[-] Several problematic assumptions, but here's one anyway.

Problematic assumptions:

* Hashtables don't break if you modify a key.  They break if your 
modification of the key causes its hash to change, or causes equality among 
keys to give different answers than they used to (unstable equality).  In E, 
PassByCopy objects are immutable, so there's no problem.  Non-PassByCopy 
objects use identity-based equality and hashing (though the hashing is 
hidden).  Since modifying an object leaves its identity unchanged, there's 
also no problem.  Hashes and equality of settled objects are always stable, 
even if the object wishes to disrupt these constraints.  Ie, this is 
guaranteed by the E implementation, not by the objects in question.

* While E does need to support transitive immutability, I don't see what it 
has to do with the issue at hand.

* The impossibility of deterministically sorting isn't due to mutability 
per-se, but to featureless object identity.  Consider:

	? define BrandMaker := import:org.erights.e.elang.sealing.Brand
	# value: <statics of org.erights.e.elang.sealing.Brand>

	? define [s1,u1] := BrandMaker pair("Joe")
	# value: [<Joe sealer>, <Joe unsealer>]

	? define [s2,u2] := BrandMaker pair("Joe")
	# value: [<Joe sealer>, <Joe unsealer>]

	? define envelope := s1 seal(3)
	# value: <sealed by Joe>

	? u1 unseal(envelope)
	# value: 3

	? u2 unseal(envelope)
	# problem: <UnsealingException: box not sealed by matching sealer>

u1 and u2 are clearly different unsealers, but the source of their 
difference is only that they came from different creation events.  There's 
no full-ordering rule you could write that wouldn't reveal information (like 
relative creation order) that was none of their client's business.

* Given that the E programmer cannot mutate the information about a key used 
by == and hash lookup, the mutability-qua-mutability (as Rand might say) of 
keys is not a source of bugs or security holes.  Their unsortability was a 
source of a security hole (now plugged), but except for this hole, it was 
not a source of any bugs.

But, these problems aside, on to your challenge.  Since the hashcodes are 
hidden from the E programmer, a hash table merely has the semantics of 
linear-time lookup on a set of key-value pairs:

    ? define FlexMapMaker() {
    >     define pairs := [] diverge
    >     define NaiveFlexMap {
    >         to put(key, value) {
    >             escape return {
    >                 for i => [k, v] in pairs {
    >                     if (k == key) {
    >                         pairs[i] := [key, value]
    >                         return()
    >                     }
    >                 }
    >                 pairs push([key, value])
    >             }
    >         }
    >         to get(key) {
    >             escape return {
    >                 for [k, v] in pairs {
    >                     if (k == key) {
    >                         return(v)
    >                     }
    >                 }
    >                 throw("Not found: " + key)
    >             }
    >         }
    >         #...
    >     }
    > }
    # value: <FlexMapMaker>
    
    ? define flex := FlexMapMaker()
    # value: <NaiveFlexMap>
    
    ? flex["foo"] := "bar"
    # value: bar
    
    ? flex["zip"] := "zap"
    # value: zap
    
    ? flex["foo"]
    # value: bar

The only magic of hashtables is that they operate in less than the linear 
time of the above implementation.  They do not provide any semantics beyond 
that of ==.  This reduces the question to "Should == be defined on mutable 
objects?"


>> However, two non-PassByCopy objects that are otherwise identical are
>> still different according to their creation identity.
>
>I thought E's maps were maps and not multimaps.

[?] Yes.  I don't understand why you're asking.
Since they are *different* according to creation identity, they are 
different keys.  E's maps have at most one association for a given key, 
where "a given key" can be thought of as "all keys that are the same as a 
given key".  In other words, it again bottoms out in the definition of ==.


>> Unless we want to
>> internally do the bookkeeping to keep track of creation order, there's no
>> way to deterministically sort these.
>
>Tree based sorts are stable sorts. This means that the ordering of equal
>elements is preserved. This means that in the "one true order" of a binary 
>tree,
>more recently inserted elements are placed after previously inserted 
>elements. I
>believe this is the same kind of determinism you deem acceptable for E's 
>current
>hashtables.

[+] I'm doing a degenerate form of this.
You could say that the deterministic order I've decided on is a stable sort 
where all keys are considered <=>, and therefore only the stability, not the 
sort, matters.


>> It took me two weeks to stop looking for determinism in the keys, where it
>> can't be found,
>
>If you're talking about determinism in an order which is based on hash value,
>then of course the keys don't contain the necessary information. The 
>capacity of
>the hash map also plays a role in the order and the keys cannot possibly know
>about this.
>
>If you're talking about determinism in an order based on element value, then 
>the
>keys necessarily must contain the required determinism. There is no other
>information used in constructing the order; therefore, the keys determine the
>order.
>
>I feel like there's something fundamental here that I am not understanding. 
>What am I missing?

[+] You are missing only that I was being dense.
It just took me two weeks to look in the right place.  I'd started off 
thinking along the lines of assigning deterministic hashes, similar to what 
Dean recently suggested.



>> To see why this is the only possible solution, take the quotes off of "foo"
>> and "bar" and imagine that these two variables hold two
>> identical-except-for-identity non-PassByCopy objects.
>
>No, I won't because I think it's a bad idea. See above. Is this the only 
>part of
>the non-determinism issue that I am not understanding?

[+] I believe so.
But this issue is the whole game.


>> >The problem here is that E overloads [] to mean both a list and a table.
>> >You're not allowed to re-order a list from what was entered by the user.
>>
>> [-] It's just notation overload, not semantic unity.
>> Depending on whether it contains "=>"s, E's [] represent either a list or a
>> map.
>
>Looks like I've fallen behind. I thought [] without => was a set. What's a set
>now?

[-] It never meant a set.
"[a, b, c]" was always a list/tuple/sequence.  "[a, b, c] asSet" was a way 
to write a map that represents the set of these three elements, but only 
because we defined lists to respond to the "asSet" message to produce such a 
map:

	? ["a", "b", "c"] asSet
	> [a => null, b => null, c => null]

	? ["a", "b", "c"] asMap
	> [0 => a, 1 => b, 2 => c]

Our conversation on Anguilla convinced me that I should leave myself the 
option of introducing a true "Set" data type eventually, so starting in 
0.8.3 "asSet" was renamed "asKeys".


>I meant you cannot re-order a list because this changes the meaning of the 
>list,
>not because of any determinism argument. (i before e except after c).

[+] Ok.


>> >	("markm", 42)
>> I don't understand the difference between an untyped/unlabeled ordered tuple
>> and a list.  I also don't understand how a tuple is like a hashtable.
>
>An ordered tuple (map) is not necessarily equal to the list that was used to
>initialize it. If the list was not in sorted order, then the ordered tuple 
>(map)
>has a different ordering.
>
>Your E 0.8.4 [] is both a tuple and a hashtable. The fact that the ordering of
>the hashtable is solely determined by insertion order makes it like a tuple.

[-] I don't understand.
Lists are the same thing as ordered tuples, and neither lists nor ordered 
tuples are maps.  That they both use square brackets is only syntax.  As you 
may recall, I used to use curly braces for maps and square brackets for 
lists.  (Perhaps I should have stuck with that.)


>> >	birthday("markm", 42)
>> "birthday" can be a function for constructing an object representing a
>> birthday, and this expression would then be a function call.
>
>I see and I like this. Ok, maybe it's just , that constructs a list. No 
>brackets
>required. (Look ma, no hands!). [] constructs an ordered map from the list and
>() does nothing but clarify order of operations. name() is then always a 
method
>invocation. All method invocations take a list as their only parameter.
>Receiving a method invocation like:
>foo (bar, baz) {
>}
>assigns a name to each list element and
>foo in {
>}
>assigns a name to the list as a whole.
>
>Minus the receiving a list as a whole bit, does the current E definition allow
>me to think of things in this way?

[-] Sorry, I don't think so.

Btw, what do you mean by an "ordered map"?  Maps are indeed now also 
ordered, but your above text seems to be using "map" for something that 
doesn't have hashtable-nature.  Perhaps I missed a definition that went by?

	Cheers,
	--MarkM